# Do the downloads of very important, detailed and innovative data of the world about ´´my´´ dissertation like the graphics I did about the variations of weights of all mice (Control and Study Groups) taking into account the ages of the all animals during all experimental time!! These data can be very useful to the world scientific community, of course!! There´re relevant websites, links and images in this post!! @ ´´The world people need to have very efficient researches and projects resulting in very innovative drugs, vaccines, therapeutical substances, medical devices and other technologies according to the age, the genetics and medical records of the person. So, the treatment, diagnosis and prognosis will be very efficient and better, of course´´. Rodrigo Nunes Cal & 20 March 2019 – It’s time to talk about ditching statistical significance – Looking beyond a much used and abused measure would make science harder, but better. @ Moving to a World Beyond “p < 0.05” – Ronald L. Wasserstein,Allen L. Schirm &Nicole A. LazarPages 1-19 | Published online: 20 Mar 2019 & Scientists rise up against statistical significance – Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects. @ Americal Statistical Association & YouTube Video: Introduction to Module 6: Drug Discovery and Development – NIH Clinical Center @ YouTube Video: Vaccines 101: How new vaccines are developed – nature video @ Innovation – Graphics – References – Time – Journals – World – Data – DNA – Times – Each Person – Protein – Cell – RNA – Biological Factors – Efficiency – Therapeutic Substances – Drugs – Vaccines – Substances Tests – Medical Devices – Detailed research – Reference – Future – Science – Human Researches – Animal Testing – Study and Research Centers – Laboratories – Pharmaceutical Industries – Investments – People – Person – Mice – Analysis – Age – Ages – Personalized Medicine – Importance – Unknown diseases – Prevention – Diagnosis – Prognosis – Treatment – Fatal Diseases – Human Expectancy Of Life – Human Longevity @ Relevant information related to book reading and its interconnected aspects in the school environment – Rodrigo Nunes Cal – Part 1 and Part 2 & Other Documents

Mestrado – Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and Graphics

My suggestion of a very important Project…

LISTA DE NOMES – PEOPLE´S NAMES – E-MAIL LIST – LISTA DE E-MAILS

Feedback positivo de pessoas sobre minha dissertação pelo Messenger – Facebook. Positive feedback of people about my dissertation, blog and YouTube channel by Facebook – Messenger. Ano

rodrigonunescal_dissert

GRUPO_AF2

GRUPO_AF1

GRUPO AFAN 2

GRUPO AFAN 1

CARCINÓGENO DMBA EM MODELOS EXPERIMENTAIS

Mestrado – Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and Graphics

Avaliação da influência da atividade física aeróbia e anaeróbia na progressão do câncer de pulmão experimental – Summary – Resumo

monografia-monograph

Textos que digitei – Texts I typed

microbiologia-famerp – Copia

nanomedicine–an-evolving-research-2155-983X-1000160(2)

european-respiratory-society

Impact_Fator-wise_Top100Science_Journals

redefine-statistical-signifcance

BIOGRAFIA-DOMINGO-MARCOLINO-BRAILE

valor-de-p

MÉTODOS DE DOSAGEM DO ÁCIDO HIALURÔNICO

Nanomedicina – Texto que escrevi. Nanomedicine – Text I typed(1)

Genes e Epilepsia

journal-of-nanomedicine–nanotechnology-flyer

As credenciais da ciência – Mestrado

rodrigonunescal_dissert

LISTA DE NOMES – PEOPLE´S NAMES – E-MAIL LIST – LISTA DE E-MAILS

A Psicossomática Psicanalítica

My suggestion of a very important Project…

Apostila – Pubmed

Frases que digitei – Phrases I typed

DMBA 7,12Dimethylbenz[a]anthracene

CARCINÓGENO DMBA EM MODELOS EXPERIMENTAIS

Mestrado – Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and Graphics

Avaliação da influência da atividade física aeróbia e anaeróbia na progressão do câncer de pulmão experimental – Summary – Resumo

monografia-monograph

GRUPO AFAN 1

GRUPO AFAN 2

GRUPO_AF1

GRUPO_AF2

One-Time
Monthly
Yearly

#### Recognition donation

Choose the value

$5.00$15.00
$50.00$3.00
$10.00$25.00
$50.00$80.00
$100.00 Or enter a custom value$

• American Statistical Association

https://www.amstat.org/

The diffusion of relevant information and knowledge is essential for a country progress always!! A difusão de relevantes informações e conhecimentos é sempre essencial para o progresso de um país!!

• – >Mestrado – Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and Graphics – ´´My´´ Dissertation @ #energy #time #tempo #energia #life #health #saúde #vida #people #person #pessoa #pessoas #reading #leitura #vision #visão #Innovation #internet #history #história #Graphics #Gráficos #dissertation #dissertação #mestrado #research #pesquisa #details #detalhes #thoughts #thinking #reflection #reflexão #pensamentos #importance #communication #comunicações #importância #information #knowledge #informações #conhecimento #Ciência #Science #data #dados #diffusion #difusão #countries #países #cell #DNA #Célula #RNA #substances #drugs #vaccines #TherapeuticalSubstances #efficacy #eficiência #diagnosis #prognosis #treatment #disease #UnknownDiseases #name #times #influences #longevity #age #ages #test #humans #AnimalTesting #MedicalDevices #tests #laboratories #investmens #researches #references #citations #ImpactFactor #journals

GRUPO_AF1 – ´´My´´ Dissertation

GRUPO AFAN 1 – ´´My´´ Dissertation

GRUPO_AF2 – ´´My´´ Dissertation

GRUPO AFAN 2 – ´´My´´ Dissertation

Slides – mestrado – ´´My´´ Dissertation

DMBA CARCINOGEN IN EXPERIMENTAL MODELS

´´We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries.´´ https://www.nature.com/articles/s41562-017-0189-z Published:  Daniel J. Benjamin, James O. Berger, […]Valen E. Johnson Nature Human Behaviour volume 2, pages6–10 (2018)

Um mundo além de p < 0,05 « Sandra Merlo – Fonoaudiologia da Fluência

´´The world people need to have very efficient researches and projects resulting in very innovative drugs, vaccines, therapeutical substances, medical devices and other technologies according to the age, the genetics and medical records of the person. So, the treatment, diagnosis and prognosis will be very efficient and better, of course´´. Rodrigo Nunes Cal

´´O mundo precisa ter pesquisas e projetos muito eficientes que resultem em medicamentos, vacinas, substâncias terapêuticas, dispositivos médicos e outras tecnologias muito inovadoras de acordo com a idade, a genética e o prontuário da pessoa. Assim, o tratamento, o diagnóstico e o prognóstico serão bem mais eficientes e melhores, é claro´´. Rodrigo Nunes Cal

´´These graphics I did based on data of ´´my´´ dissertation can be an excellent reference for many types of researches and projects, of course!! These graphics I made of ´´my´´ dissertation can be an excellent reference for many types of research and projects, of course !! For example, for research and projects that aim to analyze the types of influences of age in mice and humans (genetic studies, for example) in relation to different stages of certain diseases, such as those unknown to the scientific community (unknown diseases) and that may appear in the world. Other types of research and projects that can be done are those that analyze the effectiveness of different types of therapeutic substances in mice and humans in different types of diseases in order to know the best therapeutic substance for each disease, taking into account age and genetics for sure. However, conducting research with the aim of evaluating the influence of age and genetics on therapeutic efficiency in experimental animals and in humans, therefore, has great value, as many people fall ill and end up dying even being subjected to excellent systems. diagnosis, treatment and prognosis, depending on the disease you have and its stage´´. Rodrigo Nunes Cal

´´Thus, by proving that there is a significant influence of age and genetics, consequently, in the pathogenesis process of diseases in experimental animals and in humans, it would be much more convenient and logical to have a group of individuals of a certain age group and family history and individual diseases acquired and likely to be acquired over time being treated with other types of drugs, therapeutic substances, vaccines, among others. Certainly, to carry out this type of research, a large financial investment in laboratories for the acquisition of high-efficiency scientific devices is necessary. For example, research at the genetic level and the evaluation of the influence of the age(s) of the human being(s) in the pathogenesis of diseases, even of diseases unknown to the scientific world that may appear in the future. The fact is that this type of research is very important because as everyone knows, there are many diseases that come back to manifest in people even after excellent treatments, of the most varied types. And another point to be reflected upon is the permanence of fatal diseases worldwide, unfortunately´´. Rodrigo Nunes Cal

´´Taking into account this type of influence of age and consequently of genetics on the pathophysiology of hereditary or non-hereditary diseases, an individual of a certain age will be submitted to a different therapeutic treatment from another with the same disease but at a different age since research indicates a significant influence on the pathogenesis of the disease, therefore, will use another type of therapeutic substance, drugs, among others. This will certainly happen if the world scientific community scientifically proves these hypotheses through various research and scientific projects. People need to have excellent innovations for human life expectancy to increase very quickly more and more. Therefore, the more personalized medicine is, the better it will be for each of us´´. Rodrigo Nunes Cal

´´Esses gráficos que fiz baseados em dados da ´´minha´´ dissertação podem ser uma excelente referência para muitos tipos de pesquisas e projetos, é claro !! Por exemplo, para pesquisas e projetos que tenham como objetivo analisar os tipos de influências das idades em camundongos e humanos (estudos genéticos, por exemplo) em relação a diferentes estágios de determinadas doenças, como aquelas em que a comunidade científica desconhece (doenças desconhecidas) e que poderão surgir no mundo. Outros tipos de pesquisas e projetos que podem ser feitos são aqueles que analisam a eficácia de diversos tipos de substâncias terapêuticas em camundongos e humanos nos diversos tipos de doenças com o objetivo de saber qual a melhor substância terapêutica para cada doença, levando em consideração a idade e genética, com certeza. Contudo, realizar pesquisas com o objetivo de avaliar qual é a influência da idade e da genética na eficiência terapêutica em animais de experimentação e em humanos, consequentementepossui um grande valorjá que muitas pessoas adoecem e acabam falecendo mesmo sendo submetidas à ótimos sistemas de diagnóstico, tratamento e prognóstivo, dependendo da doença que possui e do estágio da mesma´´. Rodrigo Nunes Cal

´´A partir daí, ao comprovar que existe uma influência significativa da idade e da genética, consequentemente, no processo de patogênese das doenças em animais de experimentação e em humanos, seria muito mais conveniente e lógico ter um grupo de indivíduos de uma determinada faixa etária e história familiar e individual de doenças adquiridas e passíveis de serem adquiridas ao longo do tempo sendo tratadas com outros tipos de medicamentos, substâncias terapêuticas, vacinas, entre outros. Certamente, para realizar esse tipo de pesquisa é necessário um grande investimento financeiro em laboratórios para a aquisição de dispositivos científicos de alta eficiência. Por exemplo, pesquisas em nível genético e de avaliação da influência da(s) idade(s) do(s) ser(es) humano(s) na patogênese de doenças, mesmo das doenças desconhecidas pelo mundo científico que poderão surgir no futuro. O fato é que esse tipo de pesquisa é muito importante porque como todos sabem, são muitas as doenças que voltam a se manifestar nas pessoas mesmo após excelentes tratamentos, dos mais variados tipos. E outro ponto a ser refletido é a permanência de doenças fatais no mundo todoinfelizmente´´. Rodrigo Nunes Cal

´´Assim, este tipo de influência da idade e consequentemente da genética na fisiopatologia das doenças hereditárias ou não hereditárias, um indivíduo de uma determinada idade será submetido a um tratamento terapêutico diferente de outro com a mesma doença mas com idade diferente desde que as pesquisas apontam uma influência significativa na patogênese da doença, portanto, utilizará outro tipo de substância terapêutica, medicamentos, entre outros. Isso certamente acontecerá se a comunidade científica mundial comprovar cientificamente essas hipóteses por meio de várias pesquisas e projetos científicos. As pessoas precisam ter excelentes inovações para a expectativa de vida humana aumentar muito rapidamente cada vez mais. Portanto, quanto mais a medicina personalizada for, melhor será para cada um de nós´´. Rodrigo Nunes Cal

´´As you know, the world needs to have very efficient researches resulting new therapeutical substances, medical devices, prevention, diagnostic and prognostic methods, drugs and vaccines, for example. Very detailed researches are fundamental in this process. So, high impact innovations are essential to the world people!!´´ Rodrigo Nunes Cal

´´Como você sabe, o mundo precisa de pesquisas muito eficientes que resultem em novas substâncias terapêuticas, aparelhos médicos, métodos de prevenção, de diagnósticos e de prognósticos, medicamentos e vacinas, por exemplo. Pesquisas muito detalhadas são fundamentais neste processo, obviamente. Portanto, inovações de grande impacto são essenciais para as pessoas do mundo todo!!´´ Rodrigo Nunes Cal

´´Science opens up a range of possibilities for the most varied subjects and the most varied levels of difficulties. Therefore, the full understanding of the physiological mechanisms involved in the pathogenesis of diseases, highlighting the fatal diseases that occur in humans, is of paramount importance for humanity, for sure. Maybe in the near future this understanding will be very well understood and who knows, as many people dream of physical immortality, among them scientists, this great achievement can be acquired. As you know, humanity urgently needs new scientific discoveries, highlighting the acquisition of highly effective medicines and vaccines against a large number of diseases, including unknown and silent diseases, obtaining very efficient medical devices for diagnosis, prognosis and treatment of diseases, among others. Certainly, we need to have hope and motivation to do great work and for a much better quality of life and an ever-increasing life expectancy´´. Rodrigo Nunes Cal

´´A Ciência abre um leque de possibilidades dos mais variados assuntos e dos mais variados níveis de dificuldades. Sendo assim, a compreensão total dos mecanismos fisiológicos envolvidos na patogênese das doenças, destacando-se as doenças fatais que ocorrem em humanos, é de suma importância para a humanidade, com toda certeza. Quem sabe num futuro próximo essa compreensão seja muito bem entendida e quem sabe, como muitas pessoas sonham na imortalidade física, dentre elas cientistas, essa grande conquista pode ser adquirida. Como você sabe, a humanidade necessita urgentemente de novas descobertas científicas, destacando-se a aquisição de medicamentos e vacinas altamente eficazes contra um grande número de doenças, dentre elas as doenças desconhecidas e silenciosas, a obtenção de aparelhos médicos muito eficientes de diagnóstico, prognóstico e tratamento de doenças, entre outros. Com certeza, precisamos ter esperança e motivação para a realização de ótimos trabalhos e para uma qualidade de vida bem melhor e uma expectativa de vida cada vez maior´´. Rodrigo Nunes Cal

´´Having big goals, many of which are said to be impossible, is very important because it can induce, in different ways, in different intensities in the world scientific society, the emergence of great scientific discoveries for all of humanity, significantly benefiting people’s lives and causing the increase of human life expectancy significantly. Thus, diseases known today as being fatal and incurable, can become curable diseases, with the achievement of highly effective prevention, diagnosis, prognosis and treatment technologies´´. Rodrigo Nunes Cal

´´Ter grandes objetivos, muitos dos quais ditos impossíveis, é muito importante porque pode induzir, de diferentes formas, em diferentes intensidades na sociedade científica mundial, o surgimento de grandes descobertas científicas para toda a humanidade, beneficiando significativamente as pessoas vidas e causando um aumento significativo da expectativa de vida humana. Assim, doenças hoje tidas como fatais e incuráveis, podem tornar-se doenças curáveis, com o alcance de tecnologias altamente eficazes de prevenção, diagnóstico, prognóstico e tratamento´´. Rodrigo Nunes Cal

´´The very unknown path may be the best path to take in life´´. Rodrigo Nunes Cal
´´O caminho muito desconhecido pode ser o melhor caminho a seguir na vida´´. Rodrigo Nunes Cal

Mestrado – Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and Graphics

GRUPO_AF1

GRUPO_AF2

GRUPO AFAN 1

GRUPO AFAN 2

CARCINÓGENO DMBA EM MODELOS EXPERIMENTAIS

Avaliação da influência da atividade física aeróbia e anaeróbia na progressão do câncer de pulmão experimental – Summary – Resumo

´´We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries.´´ https://www.nature.com/articles/s41562-017-0189-z Published:  Daniel J. Benjamin, James O. Berger, […]Valen E. Johnson Nature Human Behaviour volume 2, pages6–10 (2018)

Um mundo além de p < 0,05 « Sandra Merlo – Fonoaudiologia da Fluência

Um mundo além de p < 0,05OUT 14, 2019 | POR SANDRA MERLO | ESTATÍSTICA6

O foco exagerado em p < 0,05 prejudica o raciocínio estatístico e produz distorções na literatura científica.

The American Statistician” é uma revista científica sobre estatística publicada pela “American Statistical Association”. A revista é publicada desde 1947 e atingiu alto fator de impacto em 2018 (5.381).

No início de 2019, a revista publicou um suplemento ao número 73, o qual foi intitulado “Statistical Inference in the 21st Century: A World Beyond p < 0.05”. Todo o suplemento é de acesso aberto. Tanto os editores, quanto os pesquisadores que escreveram para este suplemento argumentam que é incorreto o procedimento de utilizar apenas o valor do p para tomar decisões sobre o que é ou não é significante em estatística. A “American Statistical Association” já havia se pronunciado em 2016 sobre o assunto, através da publicação do manifesto “ASA Statement on p-Values and Statistical Significance”, no qual alertava a comunidade científica sobre o mau uso do valor do p. Agora, através de sua revista científica, eles voltam a fazer o mesmo alerta.

O assunto foi também recentemente discutido na revista Nature (“Scientists rise up against statistical significance”).

#### Definição

A letra “p” se refere à “probabilidade”. Assim, o “valor do p” se refere à probabilidade de ocorrer um resultado igual ou maior ao obtido no teste inferencial, sendo a hipótese nula verdadeira.

#### O que não fazer

Segue abaixo a tradução literal dos cinco pontos do que não fazer em relação ao valor do p, expressos na primeira página do Editorial:

• “Não embase suas conclusões apenas no fato de uma associação ou efeito ter sido “estatisticamente significante” (ou seja, o valor do p ultrapassou algum limite arbitrário, como p < 0,05)”.
• “Não acredite que uma associação ou efeito existe apenas porque foi estatisticamente significante”.
• “Não acredite que uma associação ou efeito não existe apenas porque não foi estatisticamente significante”.
• “Não acredite que o valor do p fornece a probabilidade de o acaso produzir a associação ou efeito que foi observado ou que fornece a probabilidade de que a hipótese testada seja verdadeira”.
• “Não conclua nada sobre a importância científica ou prática tendo como base a significância estatística (ou a falta dela)”.

Segundo os editores, os erros acima são frequentemente cometidos, indicando mau uso da estatística. Para eles, usar menos a “significância estatística” significa usar mais o raciocínio estatístico.

#### “Estatisticamente significante”

Segundo os editores, é hora de parar de dizer “estatisticamente significante”, “estatisticamente diferente”, “não significante”, “p < 0,05” ou de adicionar asteriscos às tabelas de resultados.

Historicamente, quem primeiro usou a expressão “estatisticamente significante” foi Francis Ysidro Edgeworth em 1885. Sua intenção era simplesmente indicar quando um resultado merecia maior consideração. Mas a expressão se tornou realmente conhecida quando Ronald Aylmer Fisher a usou em 1925. A partir daí, a expressão se tornou amplamente utilizada, mas não com o sentido original. De acordo com os editores, o que era uma ferramenta se transformou em uma tirania. O rótulo de “estatisticamente significante” não implica que uma associação ou efeito seja plausível, real ou importante (o inverso se aplica para “não significante”). Segundo os editores, o rótulo “estatisticamente significante” é usado para dar ar de autoridade aos achados, tendo em vista que a diferença entre “significante” e “não significante” não é em si estatisticamente significante (isso não é uma piada; veja artigo aqui).

#### Distorções nas publicações científicas

Um dos fatores mais importantes do mau uso do valor do p, segundo os editores, é que o excesso de preocupação com p < 0,05 levou a distorções nas publicações científicas. Ou seja, as pesquisas que merecem ser publicadas são apenas aquelas cujos resultados apresentam p < 0,05. Esta questão foi problematizada desde 1979 e é chamada de “file drawer problem”: ficam engavetadas as pesquisas que apresentam resultados com p > 0,05. O engavetamento de pesquisas acaba distorcendo a literatura científica, a qual super-representa resultados “estatisticamente significantes”. Segundo os editores, o “file drawer problem” compromete a integridade da produção científica.

O foco em p < 0,05 também coloca em primeiro lugar os resultados dos estudos ao invés da importância da pergunta que se está tentando responder ou dos métodos utilizados para respondê-la. Como consequência, há muitos estudos com resultados com p < 0,05, mas que respondem a perguntas banais ou que utilizam métodos inapropriados.

Os editores também argumentam que, como se acredita piamente no valor do p, com p < 0,05 se consegue muitas coisas: reivindicação de conhecimento, publicações, financiamento e promoções. “Não importa se o valor do p não significa o que as pessoas pensam; ele é valioso devido ao que ele pode comprar” (veja aqui).

#### Inferência estatística versus inferência científica

Os editores ressaltam que a dicotomização no valor do p faz com que ocorra outra confusão: de que inferência estatística e inferência científica são equivalentes. Essa confusão se deve a uma má compreensão do que a estatística é. Os editores defendem que a estatística é a ciência que nos permite (1) aprender a partir dos dados e (2) medir, controlar e comunicar incertezas.

#### Aceitar as incertezas

Os editores frisam que os pesquisadores precisam aprender a aceitar que as incertezas existem e sempre vão existir. Mas muitos pesquisadores tentam fugir das incertezas através da dicotomização “significante” e “não significante”. Não é possível dizer que um resultado “significante” sempre vai ocorrer, nem dizer que um resultado “não significante” nunca vai ocorrer. Ambos são incertos, porque sempre existe variação nos dados.

Os métodos estatísticos não são capazes de retirar as incertezas dos dados. Segundo Andrew Gelman (aqui), a estatística frequentemente é vendida como uma alquimia que transforma aleatoriedades em certezas. Mas, na verdade, os resultados estatísticos são mais incertos do que se costuma dizer. A incerteza deve fazer parte da comunicação dos resultados estatísticos, sendo expressa com medidas de erro (tais como erro padrão e intervalo de confiança). Segundo os editores, entender que as incertezas são inevitáveis são um antídoto contra a falsa certeza da “significância estatística”.

#### Reforma institucional

Os editores apontam que são necessárias mudanças no meio acadêmico, nos periódicos científicos e nas agências financiadoras, para que o valor do p não seja mais utilizado como o único resultado que importa. Por exemplo, o “Basic and Applied Social Psychology” baniu o uso do valor do p em suas publicações já há alguns anos. Os editores também sugerem o envio da “ASA Statement on p-Values and Statistical Significance” quando os pesquisadores submeterem um artigo ou uma revisão, a fim de melhorar a compreensão e a prática da estatística.

#### O que fazer

Os editores não propõem o abandono do valor do p. Mas que seu uso seja mais bem feito. São dadas algumas sugestões:

1. Estudos exploratórios (que são a maioria das publicações) poderiam apenas utilizar estatística descritiva, enquanto estudos mais avançados e mais bem estruturados deveriam utilizar estatística inferencial.
2. Ao relatar o resultado de um teste, o valor do p sempre deve ser fornecido de forma contínua (por exemplo, p = 0,08) e não dicotomizado por um limiar aleatório (“significante” versus “não significante”).
3. Mais de um teste pode ser aplicado, a fim de se verificar se ambos apresentam valores semelhantes de p e, portanto, se as mesmas conclusões se aplicam.
4. O valor do p de um teste pode ser complementado com outras métricas, como a segunda geração de valor do p, o valor s, a análise de credibilidade e o risco de falso positivo (estas métricas são explicadas em artigos específicos do suplemento da edição 73 da “The American Statistician”).
5. O valor do p pode ser complementado com gráficos.
6. O valor do p deve ser explicado, ou seja, o pesquisador deve ter conhecimento para explicar linguisticamente o que aquele valor significa no contexto de sua pesquisa (o que inclui considerar o tamanho da amostra e o tamanho do efeito).

#### Referência

A world beyond p < 0.05
OCT 14, 2019 | BY SANDRA MERLO | STATISTIC
“The American Statistician” is a scientific journal on statistics published by the “American Statistical Association”. The journal has been published since 1947 and reached a high impact factor in 2018 (5,381).

In early 2019, the magazine published a supplement to issue 73, which was titled “Statistical Inference in the 21st Century: A World Beyond p < 0.05”. The entire supplement is open access. Both the editors and the researchers who wrote for this supplement argue that the procedure of using only the p-value to make decisions about what is and is not statistically significant is incorrect. The “American Statistical Association” had already commented on the subject in 2016, through the publication of the manifesto “ASA Statement on p-Values ​​and Statistical Significance”, in which it warned the scientific community about the misuse of the p-value. Now, through their scientific journal, they are making the same warning again.

The matter was also recently discussed in the journal Nature (“Scientists rise up against statistical significance”).
Definition
The letter “p” refers to “probability”. Thus, the “p value” refers to the probability of occurring a result equal to or greater than that obtained in the inferential test, the null hypothesis being true.

what not to do
Below is the literal translation of the five points of what not to do in relation to the value of p, expressed on the first page of the Editorial:

"Don't base your conclusions solely on the fact that an association or effect was 'statistically significant' (ie, the p-value exceeded some arbitrary threshold, such as p < 0.05)."
"Don't believe that an association or effect exists just because it was statistically significant."
"Don't believe that an association or effect doesn't exist just because it wasn't statistically significant."
"Don't believe that the p-value provides the probability that chance will produce the association or effect that was observed or that it provides the probability that the hypothesis being tested is true."
"Conclude nothing about scientific or practical importance on the basis of statistical significance (or lack thereof)."
According to editors, the above mistakes are frequently made, indicating misuse of statistics. For them, using “statistical significance” less means using statistical reasoning more.
"Statistically significant"
According to the editors, it's time to stop saying “statistically significant”, “statistically different”, “not significant”, “p < 0.05” or adding asterisks to the results tables.

Historically, who first used the expression “statistically significant” was Francis Ysidro Edgeworth in 1885. His intention was simply to indicate when a result deserved further consideration. But the expression actually became known when Ronald Aylmer Fisher used it in 1925. Since then, the expression has become widely used, but not in its original sense. According to the editors, what was a tool has turned into a tyranny. The label “statistically significant” does not imply that an association or effect is plausible, real or important (the reverse applies for “not significant”). According to the editors, the label “statistically significant” is used to give an air of authority to the findings, given that the difference between “significant” and “not significant” is not itself statistically significant (this is not a joke; see article on here).
Distortions in scientific publications
One of the most important factors in the misuse of the p value, according to the editors, is that the excessive concern with p < 0.05 led to distortions in scientific publications. In other words, the research that deserves to be published are only those whose results present p < 0.05. This issue has been questioned since 1979 and is called the “file drawer problem”: surveys with results with p > 0.05 are shelved. The shelving of research ends up distorting the scientific literature, which over-represents “statistically significant” results. According to the editors, the “file drawer problem” compromises the integrity of scientific production.

Focusing on p < 0.05 also puts study results first rather than the importance of the question you are trying to answer or the methods used to answer it. As a result, there are many studies with results with p < 0.05, but that answer trivial questions or that use inappropriate methods.

The editors also argue that, as the p-value is strongly believed, with p < 0.05 many things can be achieved: knowledge claims, publications, funding, and promotions. “It doesn't matter if the p-value doesn't mean what people think; he is valuable because of what he can buy” (see here).

Statistical inference versus scientific inference
The editors point out that the p-value dichotomization causes another confusion to occur: that statistical inference and scientific inference are equivalent. This confusion is due to a misunderstanding of what statistics are. The editors argue that statistics is the science that allows us to (1) learn from data and (2) measure, control, and communicate uncertainty.
Accept the uncertainties
Editors stress that researchers need to learn to accept that uncertainties exist and will always exist. But many researchers try to escape uncertainty by dichotomizing “significant” and “non-significant”. It is not possible to say that a “significant” result will always occur, nor can it be said that a “non-significant” result will never occur. Both are uncertain because there is always variation in the data.

Statistical methods are not able to remove uncertainties from the data. According to Andrew Gelman (here), statistics are often sold as an alchemy that turns randomness into certainties. But in fact the statistical results are more uncertain than is often said. Uncertainty should be part of reporting statistical results, being expressed as error measures (such as standard error and confidence interval). According to the editors, understanding that uncertainties are inevitable is an antidote to the false certainty of “statistical significance”.

institutional reform
The editors point out that changes are needed in academia, scientific journals and funding agencies, so that the value of p is no longer used as the only result that matters. For example, the “Basic and Applied Social Psychology” banned the use of the p-value in its publications a few years ago. The editors also suggest submitting the “ASA Statement on p-Values ​​and Statistical Significance” when researchers submit a paper or review to improve the understanding and practice of statistics.
What to do
The editors do not propose to abandon the p-value. But may its use be better done. Some suggestions are given:

Exploratory studies (which are the majority of publications) could only use descriptive statistics, while more advanced and better structured studies should use inferential statistics.
When reporting a test result, the p value should always be given continuously (eg, p = 0.08) and not dichotomized by a random threshold (“significant” versus “not significant”).
More than one test can be applied in order to verify if both have similar p values ​​and, therefore, if the same conclusions apply.
The p-value of a test can be complemented with other metrics, such as the second generation p-value, the s-value, the credibility analysis, and the risk of false positives (these metrics are explained in specific articles in the 73th edition supplement of “The American Statistician”).
The p-value can be supplemented with graphics.
The p value must be explained, that is, the researcher must have the knowledge to linguistically explain what that value means in the context of their research (which includes considering the sample size and effect size).

Reference
Wasserstein, R.L.; Schirm, A.L. & Lazar, N.A. (2019). Moving to a world beyond “p < 0.05”. The American Statistician, 73: sup1, p. 1-19.

https://en.wikipedia.org/wiki/P-value

https://jamanetwork.com/journals/jamapsychiatry/article-abstract/2739306

´´For more than 20 years, there have been rumbles about banning the P value,13 because it is so often misused, miscomputed, and, even when used and computed correctly, misinterpreted. Consequently, findings that affect medical decision-making, policy, and research are often misled by the very research that is supposed to provide their evidence base.4 Recently, such rumbles have increased.57 Is now the time to ban the P value in all medical research?´´

https://jamanetwork.com/journals/jama/article-abstract/2676503

´´P values and accompanying methods of statistical significance testing are creating challenges in biomedical science and other disciplines. The vast majority (96%) of articles that report P values in the abstract, full text, or both include some values of .05 or less.1 However, many of the claims that these reports highlight are likely false.2 Recognizing the major importance of the statistical significance conundrum, the American Statistical Association (ASA) published3 a statement on P values in 2016. The status quo is widely believed to be problematic, but how exactly to fix the problem is far more contentious. The contributors to the ASA statement also wrote 20 independent, accompanying commentaries focusing on different aspects and prioritizing different solutions. Another large coalition of 72 methodologists recently proposed4 a specific, simple move: lowering the routine P value threshold for claiming statistical significance from .05 to .005 for new discoveries. The proposal met with strong endorsement in some circles and concerns in others.´´

Redefine statistical significance

Daniel J. Benjamin, James O. Berger, […]Valen E. Johnson Nature Human Behaviour volume 2, pages6–10 (2018)Cite this article
135k Accesses

´´Ethics declarations´´

Competing interests´´

´´One of the 72 authors, Christopher Chambers, is a member of the Advisory Board of Nature Human Behaviour. Christopher Chambers was not a corresponding author and did not communicate with the editors regarding the publication of this article. The other authors declare no competing interests.´´

https://www.nature.com/articles/s41562-017-0189-z

https://www.nature.com/articles/s41562-017-0189-z#article-info

Article of my dissertation: The influence of physical activity in the progression of experimental lung cancer in mice.

Pathol Res Pract. 2012 Jul 15;208(7):377-81. doi: 10.1016/j.prp.2012.04.006. Epub 2012 Jun 8.

https://pubmed.ncbi.nlm.nih.gov/22683274/

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world – ´´My´´ Master´s Degree.

´´These graphics I did based on data of ´´my´´ dissertation can be an excellent reference for many types of researches and projects, of course!! These graphics I made of ´´my´´ dissertation can be an excellent reference for many types of research and projects, of course !! For example, for research and projects that aim to analyze the types of influences of age in mice and humans (genetic studies, for example) in relation to different stages of certain diseases, such as those unknown to the scientific community (unknown diseases) and that may appear in the world. Other types of research and projects that can be done are those that analyze the effectiveness of different types of therapeutic substances in mice and humans in different types of diseases in order to know the best therapeutic substance for each disease, taking into account age and genetics for sure. However, conducting research with the aim of evaluating the influence of age and genetics on therapeutic efficiency in experimental animals and in humans, therefore, has great value, as many people fall ill and end up dying even being subjected to excellent systems. diagnosis, treatment and prognosis, depending on the disease you have and its stage´´. Rodrigo Nunes Cal

´´Thus, by proving that there is a significant influence of age and genetics, consequently, in the pathogenesis process of diseases in experimental animals and in humans, it would be much more convenient and logical to have a group of individuals of a certain age group and family history and individual diseases acquired and likely to be acquired over time being treated with other types of drugs, therapeutic substances, vaccines, among others. Certainly, to carry out this type of research, a large financial investment in laboratories for the acquisition of high-efficiency scientific devices is necessary. For example, research at the genetic level and the evaluation of the influence of the age(s) of the human being(s) in the pathogenesis of diseases, even of diseases unknown to the scientific world that may appear in the future. The fact is that this type of research is very important because as everyone knows, there are many diseases that come back to manifest in people even after excellent treatments, of the most varied types. And another point to be reflected upon is the permanence of fatal diseases worldwide, unfortunately´´. Rodrigo Nunes Cal

´´Taking into account this type of influence of age and consequently of genetics on the pathophysiology of hereditary or non-hereditary diseases, an individual of a certain age will be submitted to a different therapeutic treatment from another with the same disease but at a different age since research indicates a significant influence on the pathogenesis of the disease, therefore, will use another type of therapeutic substance, drugs, among others. This will certainly happen if the world scientific community scientifically proves these hypotheses through various research and scientific projects. People need to have excellent innovations for human life expectancy to increase very quickly more and more. Therefore, the more personalized medicine is, the better it will be for each of us´´. Rodrigo Nunes Cal

´´Esses gráficos que fiz baseados em dados da ´´minha´´ dissertação podem ser uma excelente referência para muitos tipos de pesquisas e projetos, é claro !! Por exemplo, para pesquisas e projetos que tenham como objetivo analisar os tipos de influências das idades em camundongos e humanos (estudos genéticos, por exemplo) em relação a diferentes estágios de determinadas doenças, como aquelas em que a comunidade científica desconhece (doenças desconhecidas) e que poderão surgir no mundo. Outros tipos de pesquisas e projetos que podem ser feitos são aqueles que analisam a eficácia de diversos tipos de substâncias terapêuticas em camundongos e humanos nos diversos tipos de doenças com o objetivo de saber qual a melhor substância terapêutica para cada doença, levando em consideração a idade e genética, com certeza. Contudo, realizar pesquisas com o objetivo de avaliar qual é a influência da idade e da genética na eficiência terapêutica em animais de experimentação e em humanos, consequentemente, possui um grande valor, já que muitas pessoas adoecem e acabam falecendo mesmo sendo submetidas à ótimos sistemas de diagnóstico, tratamento e prognóstivo, dependendo da doença que possui e do estágio da mesma´´. Rodrigo Nunes Cal

´´A partir daí, ao comprovar que existe uma influência significativa da idade e da genética, consequentemente, no processo de patogênese das doenças em animais de experimentação e em humanos, seria muito mais conveniente e lógico ter um grupo de indivíduos de uma determinada faixa etária e história familiar e individual de doenças adquiridas e passíveis de serem adquiridas ao longo do tempo sendo tratadas com outros tipos de medicamentos, substâncias terapêuticas, vacinas, entre outros. Certamente, para realizar esse tipo de pesquisa é necessário um grande investimento financeiro em laboratórios para a aquisição de dispositivos científicos de alta eficiência. Por exemplo, pesquisas em nível genético e de avaliação da influência da(s) idade(s) do(s) ser(es) humano(s) na patogênese de doenças, mesmo das doenças desconhecidas pelo mundo científico que poderão surgir no futuro. O fato é que esse tipo de pesquisa é muito importante porque como todos sabem, são muitas as doenças que voltam a se manifestar nas pessoas mesmo após excelentes tratamentos, dos mais variados tipos. E outro ponto a ser refletido é a permanência de doenças fatais no mundo todo, infelizmente´´. Rodrigo Nunes Cal

´´Assim, este tipo de influência da idade e consequentemente da genética na fisiopatologia das doenças hereditárias ou não hereditárias, um indivíduo de uma determinada idade será submetido a um tratamento terapêutico diferente de outro com a mesma doença mas com idade diferente desde que as pesquisas apontam uma influência significativa na patogênese da doença, portanto, utilizará outro tipo de substância terapêutica, medicamentos, entre outros. Isso certamente acontecerá se a comunidade científica mundial comprovar cientificamente essas hipóteses por meio de várias pesquisas e projetos científicos. As pessoas precisam ter excelentes inovações para a expectativa de vida humana aumentar muito rapidamente cada vez mais. Portanto, quanto mais a medicina personalizada for, melhor será para cada um de nós´´. Rodrigo Nunes Cal

´´As you know, the world needs to have very efficient researches resulting new therapeutical substances, medical devices, prevention, diagnostic and prognostic methods, drugs and vaccines, for example. Very detailed researches are fundamental in this process. So, high impact innovations are essential to the world people!!´´ Rodrigo Nunes Cal

´´Como você sabe, o mundo precisa de pesquisas muito eficientes que resultem em novas substâncias terapêuticas, aparelhos médicos, métodos de prevenção, de diagnósticos e de prognósticos, medicamentos e vacinas, por exemplo. Pesquisas muito detalhadas são fundamentais neste processo, obviamente. Portanto, inovações de grande impacto são essenciais para as pessoas do mundo todo!!´´ Rodrigo Nunes Cal

GRUPO_AF1 – ´´my´´ dissertation

GRUPO_AF2 – ´´my´´ dissertation

GRUPO AFAN 1 – ´´my´´ dissertation

GRUPO AFAN 2 – ´´my´´ dissertation

Slides – mestrado – ´´my´´ dissertation

CARCINÓGENO DMBA EM MODELOS EXPERIMENTAISDMBA CARCINOGEN IN EXPERIMENTAL MODELS

My suggestion of a very important, useful and efficient Project (Platform/Website) that can be made in the future. https://science1984.files.wordpress.com/2018/03/my-suggestion-of-a-very-important-project.pdf

List of people who gave me a positive feedback by e-mail about E-mail List I did in 2015 related to researchers from 20 best Universities of the world. I sent the Summary of the Project by e-mail (E-mail List) to researchers too. https://science1984.wordpress.com/2018/07/15/list-of-people-who-gave-me-a-positive-feedback-by-e-mail-about-a-detailed-great-and-extensive-e-mail-list-i-did-in-2015-related-to-researchers-from-20-best-universities-of-the-world-videos-ve-2/

I was invited by Twitter to participate in the Science Advisory Board: an online community of scientific and medical professionals from all around the world. Fui convidado pelo Twitter para participar do Science Advisory Board https://www.scienceboard.net/index.aspx?sec=def

Gratitude: https://www.youtube.com/watch?v=mWWDMV2W7QM I was invited through direct message by Internet to participate in 55 very important science events in 25 cities of different countries in less than 1 year because I participated of relevant researches in Brazil

´´Minha´´ monografia – ´´My´´ monographDoença de Chagas – Chagas disease

Feedback positivo de pessoas sobre minha dissertação pelo Messenger – Facebook. Positive feedback of people about my dissertation, blog and YouTube channel by Facebook – Messenger. Ano – Year: 2018

My Dissertation – ´´Minha´´ Dissertação

´´A Ciência abre um leque de possibilidades dos mais variados assuntos e dos mais variados níveis de dificuldades. Sendo assim, a compreensão total dos mecanismos fisiológicos envolvidos na patogênese das doenças, destacando-se as doenças fatais que ocorrem em humanos, é de suma importância para a humanidade, com toda certeza. Quem sabe num futuro próximo essa compreensão seja muito bem entendida e quem sabe, como muitas pessoas sonham na imortalidade física, dentre elas cientistas, essa grande conquista pode ser adquirida. Como você sabe, a humanidade necessita urgentemente de novas descobertas científicas, destacando-se a aquisição de medicamentos e vacinas altamente eficazes contra um grande número de doenças, dentre elas as doenças desconhecidas e silenciosas, a obtenção de aparelhos médicos muito eficientes de diagnóstico, prognóstico e tratamento de doenças, entre outros. Com certeza, precisamos ter esperança e motivação para a realização de ótimos trabalhos e para uma qualidade de vida bem melhor e uma expectativa de vida cada vez maior´´. Rodrigo Nunes Cal

´´Science opens up a range of possibilities for the most varied subjects and the most varied levels of difficulties. Therefore, the full understanding of the physiological mechanisms involved in the pathogenesis of diseases, highlighting the fatal diseases that occur in humans, is of paramount importance for humanity, for sure. Maybe in the near future this understanding will be very well understood and who knows, as many people dream of physical immortality, among them scientists, this great achievement can be acquired. As you know, humanity urgently needs new scientific discoveries, highlighting the acquisition of highly effective medicines and vaccines against a large number of diseases, including unknown and silent diseases, obtaining very efficient medical devices for diagnosis, prognosis and treatment of diseases, among others. Certainly, we need to have hope and motivation to do great work and for a much better quality of life and an ever-increasing life expectancy´´. Rodrigo Nunes Cal

´´Having big goals, many of which are said to be impossible, is very important because it can induce, in different ways, in different intensities in the world scientific society, the emergence of great scientific discoveries for all of humanity, significantly benefiting people’s lives and causing the increase of human life expectancy significantly. Thus, diseases known today as being fatal and incurable, can become curable diseases, with the achievement of highly effective prevention, diagnosis, prognosis and treatment technologies´´. Rodrigo Nunes Cal

´´Ter grandes objetivos, muitos dos quais ditos impossíveis, é muito importante porque pode induzir, de diferentes formas, em diferentes intensidades na sociedade científica mundial, o surgimento de grandes descobertas científicas para toda a humanidade, beneficiando significativamente as pessoas vidas e causando um aumento significativo da expectativa de vida humana. Assim, doenças hoje tidas como fatais e incuráveis, podem tornar-se doenças curáveis, com o alcance de tecnologias altamente eficazes de prevenção, diagnóstico, prognóstico e tratamento´´. Rodrigo Nunes Cal

´´The very unknown path may be the best path to take in life´´. Rodrigo Nunes Cal
´´O caminho muito desconhecido pode ser o melhor caminho a seguir na vida´´. Rodrigo Nunes Cal

´´Minha´´ monografia – ´´My´´ monographDoença de Chagas – Chagas disease

Feedback positivo de pessoas sobre minha dissertação pelo Messenger – Facebook. Positive feedback of people about my dissertation, blog and YouTube channel by Facebook – Messenger. Ano – Year: 2018

My Dissertation – ´´Minha´´ Dissertação

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

https://www.researchgate.net/publication/327711296_Nanomedicine_An_Evolving_Research – Note: I did not know about the credibility of this journal. I did not received any money for it

´´Minha´´ monografia – ´´My´´ monograph – Doença de Chagas – Chagas disease

Feedback positivo de pessoas sobre minha dissertação pelo Messenger – Facebook. Positive feedback of people about my dissertation, blog and YouTube channel by Facebook – Messenger. Ano – Year: 2018

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

The influence of physical activity in the progression of experimental lung cancer in mice

Pathol Res Pract. 2012 Jul 15;208(7):377-81. doi: 10.1016/j.prp.2012.04.006. Epub 2012 Jun 8.

https://pubmed.ncbi.nlm.nih.gov/22683274/

https://www.sciencedirect.com/science/article/abs/pii/S0344033812001082?via%3Dihub

https://repositorio.unesp.br/handle/11449/8376

https://www.researchgate.net/publication/225286318_The_influence_of_physical_activity_in_the_progression_of_experimental_lung_cancer_in_mice

https://educapes.capes.gov.br/handle/11449/8376

## Abstract

Lung cancer is one of the most incident neoplasms in the world, representing the main cause of mortality for cancer. Many epidemiologic studies have suggested that physical activity may reduce the risk of lung cancer, other works evaluate the effectiveness of the use of the physical activity in the suppression, remission and reduction of the recurrence of tumors. The aim of this study was to evaluate the effects of aerobic and anaerobic physical activity in the development and the progression of lung cancer. Lung tumors were induced with a dose of 3mg of urethane/kg, in 67 male Balb – C type mice, divided in three groups: group 1_24 mice treated with urethane and without physical activity; group 2_25 mice with urethane and subjected to aerobic swimming free exercise; group 3_18 mice with urethane, subjected to anaerobic swimming exercise with gradual loading 5-20% of body weight. All the animals were sacrificed after 20 weeks, and lung lesions were analyzed. The median number of lesions (nodules and hyperplasia) was 3.0 for group 1, 2.0 for group 2 and 1.5-3 (p=0.052). When comparing only the presence or absence of lesion, there was a decrease in the number of lesions in group 3 as compared with group 1 (p=0.03) but not in relation to group 2. There were no metastases or other changes in other organs. The anaerobic physical activity, but not aerobic, diminishes the incidence of experimental lung tumors.

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

# Drug development

Drug development describes the process of developing a new drug that effectively targets a specific weakness in a cell. This process involves specific pre-clinical development and testing, followed by trials in humans to determine the efficacy of the drug.

https://www.nature.com/subjects/drug-development

• EDITORIAL
• 20 March 2019

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

# It’s time to talk about ditching statistical significance

Looking beyond a much used and abused measure would make science harder, but better.

https://www.nature.com/articles/d41586-019-00874-8

• COMMENT
• 20 March 2019

# Scientists rise up against statistical significance

Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects.

https://www.nature.com/articles/d41586-019-00857-9

Editorial

-> Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

# Moving to a World Beyond “p < 0.05”

Ronald L. Wasserstein,Allen L. Schirm &Nicole A. LazarPages 1-19 | Published online: 20 Mar 2019

https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913

Impact Factor Wise – Top100 Science Journals

# P – VALUE, A TRUE TEST OF STATISTICAL SIGNIFICANCE? A CAUTIONARY NOTE

Ann Ib Postgrad Med. 2008 Jun; 6(1): 21–26.

Tukur Dahiru, (MBBS), FMCPH, Dip. HSM (Israel)

• Ann Ib Postgrad Med. 2008 Jun; 6(1): 21–26.
´´The medical journals are replete with P values and tests of hypotheses. It is a common practice among medical researchers to quote whether the test of hypothesis they carried out is significant or non-significant and many researchers get very excited when they discover a “statistically significant” finding without really understanding what it means. Additionally, while medical journals are florid of statement such as: “statistical significant”, “unlikely due to chance”, “not significant,” “due to chance”, or notations such as, “P > 0.05”, “P < 0.05”, the decision on whether to decide a test of hypothesis is significant or not based on P value has generated an intense debate among statisticians. It began among founders of statistical inference more than 60 years ago1-3. One contributing factor for this is that the medical literature shows a strong tendency to accentuate the positive findings; many researchers would like to report positive findings based on previously reported researches as “non-significant results should not take up” journal space4-7.´´

´´The idea of significance testing was introduced by R.A. Fisher, but over the past six decades its utility, understanding and interpretation has been misunderstood and generated so much scholarly writings to remedy the situation3. Alongside the statistical test of hypothesis is the P value, which similarly, its meaning and interpretation has been misused. To delve well into the subject matter, a short history of the evolution of statistical test of hypothesis is warranted to clear some misunderstanding.´´

´´A Brief History of P Value and Significance Testing
Significance testing evolved from the idea and practice of the eminent statistician, R.A. Fisher in the 1930s. His idea is simple: suppose we found an association between poverty level and malnutrition among children under the age of five years. This is a finding, but could it be a chance finding? Or perhaps we want to evaluate whether a new nutrition therapy improves nutritional status of malnourished children.´´

https://en.wikipedia.org/wiki/P-value

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

GRUPO_AF1 – ´´my´´ dissertation

GRUPO_AF2 – ´´my´´ dissertation

GRUPO AFAN 1 – ´´my´´ dissertation

GRUPO AFAN 2 – ´´my´´ dissertation

Slides – mestrado – ´´my´´ dissertation

CARCINÓGENO DMBA EM MODELOS EXPERIMENTAISDMBA CARCINOGEN IN EXPERIMENTAL MODELS

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

´´Minha´´ monografia – ´´My´´ monographDoença de Chagas – Chagas disease

Feedback positivo de pessoas sobre minha dissertação pelo Messenger – Facebook. Positive feedback of people about my dissertation, blog and YouTube channel by Facebook – Messenger. Ano – Year: 2018

My Dissertation – ´´Minha´´ Dissertação

Apostila – Pubmed

A Psicossomática PsicanalíticaPsychoanalytic Psychosomatics

O Homem como Sujeito da Realidade da Saúde – RedaçãoThe Human Being as a Subject of the Health RealityText I typed – Rodrigo Nunes Cal

ÁCIDO HIALURONICO – Rodrigo Nunes Cal

As credenciais da Ciência – The credentials of Science

Aula – Famerp – Resultados – Results

Frases (mensagens) que digitei – Phrases (messages) I typed

Nanomedicina – Texto que redigi. Nanomedicine – Text I typed

Nanomedicine – Rodrigo Nunes Cal

Genes e Epilepsia – Rodrigo Nunes Cal

MÉTODOS DE DOSAGEM DO ÁCIDO HIALURÔNICOHYALURONIC ACID DOSAGE METHODS – Rodrigo Nunes Cal

Microbiology – Microbiologia – Famerp

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

https://www.indispensablesoma.info/who-is-marios-kyriazis​

https://www.indispensablesoma.info/general-facts

https://www.visitcyprus.com/index.php/ru/news/195-kyriazis-the-museum-the-man-and-the-medicine

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

# Biological ageing and clinical consequences of modern technology

Biogerentology.2017 Aug;18(4):711-715. doi: 10.1007/s10522-017-9680-1. Epub 2017 Feb 9.

Mestrado – ´´My´´ Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsVery important, detailed and innovative data about ´´my´´ dissertation in the world- ´´My´´ Master´s Degree

https://pubmed.ncbi.nlm.nih.gov/28185019/

´´A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not due to any immediate external injury.[1][2] Diseases are often known to be medical conditions that are associated with specific symptoms and signs.[1] A disease may be caused by external factors such as pathogens or by internal dysfunctions. For example, internal dysfunctions of the immune system can produce a variety of different diseases, including various forms of immunodeficiency, hypersensitivity, allergies and autoimmune disorders.´´

´´When working out which methods to use, researchers should also focus as much as possible on actual problems. People who will duel to the death over abstract theories on the best way to use statistics often agree on results when they are presented with concrete scenarios. Researchers should seek to analyse data in multiple ways to see whether different analyses converge on the same answer. Projects that have crowdsourced analyses of a data set to diverse teams suggest that this approach can work to validate findings and offer new insights. In short, be sceptical, pick a good question, and try to answer it in many ways. It takes many numbers to get close to the truth.´´

https://en.wikipedia.org/wiki/Disease

https://en.wikipedia.org/wiki/Research

https://en.wikipedia.org/wiki/Statistics

https://en.wikipedia.org/wiki/Science

* Link about my monograph: Induction of benzonidazole resistance in human isolates of Trypanosoma cruzi: https://science1984.wordpress.com/2018/07/15/my-monography-chagas-disease-research-in-laboratory-2/

https://measuringu.com/statistically-significant/embed/#?secret=iTCdw2P8uV

https://www.nature.com/articles/d41586-019-00874-8

Subscribe

1. nature
2. comment
3. article
• COMMENT
• 20 March 2019

# Scientists rise up against statistical significance

Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects.

•
•

When was the last time you heard a seminar speaker claim there was ‘no difference’ between two groups because the difference was ‘statistically non-significant’?

If your experience matches ours, there’s a good chance that this happened at the last talk you attended. We hope that at least someone in the audience was perplexed if, as frequently happens, a plot or table showed that there actually was a difference.

How do statistics so often lead scientists to deny differences that those not educated in statistics can plainly see? For several generations, researchers have been warned that a statistically non-significant result does not ‘prove’ the null hypothesis (the hypothesis that there is no difference between groups or no effect of a treatment on some measured outcome)1. Nor do statistically significant results ‘prove’ some other hypothesis. Such misconceptions have famously warped the literature with overstated claims and, less famously, led to claims of conflicts between studies where none exists.

We have some proposals to keep scientists from falling prey to these misconceptions.

## Pervasive problem

Let’s be clear about what must stop: we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not. These errors waste research efforts and misinform policy decisions.

For example, consider a series of analyses of unintended effects of anti-inflammatory drugs2. Because their results were statistically non-significant, one set of researchers concluded that exposure to the drugs was “not associated” with new-onset atrial fibrillation (the most common disturbance to heart rhythm) and that the results stood in contrast to those from an earlier study with a statistically significant outcome.

Now, let’s look at the actual data. The researchers describing their statistically non-significant results found a risk ratio of 1.2 (that is, a 20% greater risk in exposed patients relative to unexposed ones). They also found a 95% confidence interval that spanned everything from a trifling risk decrease of 3% to a considerable risk increase of 48% (P = 0.091; our calculation). The researchers from the earlier, statistically significant, study found the exact same risk ratio of 1.2. That study was simply more precise, with an interval spanning from 9% to 33% greater risk (P = 0.0003; our calculation).

It is ludicrous to conclude that the statistically non-significant results showed “no association”, when the interval estimate included serious risk increases; it is equally absurd to claim these results were in contrast with the earlier results showing an identical observed effect. Yet these common practices show how reliance on thresholds of statistical significance can mislead us (see ‘Beware false conclusions’).

These and similar errors are widespread. Surveys of hundreds of articles have found that statistically non-significant results are interpreted as indicating ‘no difference’ or ‘no effect’ in around half (see ‘Wrong interpretations’ and Supplementary Information).

In 2016, the American Statistical Association released a statement in The American Statistician warning against the misuse of statistical significance and P values. The issue also included many commentaries on the subject. This month, a special issue in the same journal attempts to push these reforms further. It presents more than 40 papers on ‘Statistical inference in the 21st century: a world beyond P < 0.05’. The editors introduce the collection with the caution “don’t say ‘statistically significant’”3. Another article4 with dozens of signatories also calls on authors and journal editors to disavow those terms.

We agree, and call for the entire concept of statistical significance to be abandoned.

We are far from alone. When we invited others to read a draft of this comment and sign their names if they concurred with our message, 250 did so within the first 24 hours. A week later, we had more than 800 signatories — all checked for an academic affiliation or other indication of present or past work in a field that depends on statistical modelling (see the list and final count of signatories in the Supplementary Information). These include statisticians, clinical and medical researchers, biologists and psychologists from more than 50 countries and across all continents except Antarctica. One advocate called it a “surgical strike against thoughtless testing of statistical significance” and “an opportunity to register your voice in favour of better scientific practices”.

We are not calling for a ban on P values. Nor are we saying they cannot be used as a decision criterion in certain specialized applications (such as determining whether a manufacturing process meets some quality-control standard). And we are also not advocating for an anything-goes situation, in which weak evidence suddenly becomes credible. Rather, and in line with many others over the decades, we are calling for a stop to the use of P values in the conventional, dichotomous way — to decide whether a result refutes or supports a scientific hypothesis5.

## Quit categorizing

The trouble is human and cognitive more than it is statistical: bucketing results into ‘statistically significant’ and ‘statistically non-significant’ makes people think that the items assigned in that way are categorically different68. The same problems are likely to arise under any proposed statistical alternative that involves dichotomization, whether frequentist, Bayesian or otherwise.

Unfortunately, the false belief that crossing the threshold of statistical significance is enough to show that a result is ‘real’ has led scientists and journal editors to privilege such results, thereby distorting the literature. Statistically significant estimates are biased upwards in magnitude and potentially to a large degree, whereas statistically non-significant estimates are biased downwards in magnitude. Consequently, any discussion that focuses on estimates chosen for their significance will be biased. On top of this, the rigid focus on statistical significance encourages researchers to choose data and methods that yield statistical significance for some desired (or simply publishable) result, or that yield statistical non-significance for an undesired result, such as potential side effects of drugs — thereby invalidating conclusions.

The pre-registration of studies and a commitment to publish all results of all analyses can do much to mitigate these issues. However, even results from pre-registered studies can be biased by decisions invariably left open in the analysis plan9. This occurs even with the best of intentions.

Again, we are not advocating a ban on P values, confidence intervals or other statistical measures — only that we should not treat them categorically. This includes dichotomization as statistically significant or not, as well as categorization based on other statistical measures such as Bayes factors.

One reason to avoid such ‘dichotomania’ is that all statistics, including P values and confidence intervals, naturally vary from study to study, and often do so to a surprising degree. In fact, random variation alone can easily lead to large disparities in P values, far beyond falling just to either side of the 0.05 threshold. For example, even if researchers could conduct two perfect replication studies of some genuine effect, each with 80% power (chance) of achieving P < 0.05, it would not be very surprising for one to obtain P < 0.01 and the other P > 0.30. Whether a P value is small or large, caution is warranted.

We must learn to embrace uncertainty. One practical way to do so is to rename confidence intervals as ‘compatibility intervals’ and interpret them in a way that avoids overconfidence. Specifically, we recommend that authors describe the practical implications of all values inside the interval, especially the observed effect (or point estimate) and the limits. In doing so, they should remember that all the values between the interval’s limits are reasonably compatible with the data, given the statistical assumptions used to compute the interval7,10. Therefore, singling out one particular value (such as the null value) in the interval as ‘shown’ makes no sense.

We’re frankly sick of seeing such nonsensical ‘proofs of the null’ and claims of non-association in presentations, research articles, reviews and instructional materials. An interval that contains the null value will often also contain non-null values of high practical importance. That said, if you deem all of the values inside the interval to be practically unimportant, you might then be able to say something like ‘our results are most compatible with no important effect’.

When talking about compatibility intervals, bear in mind four things. First, just because the interval gives the values most compatible with the data, given the assumptions, it doesn’t mean values outside it are incompatible; they are just less compatible. In fact, values just outside the interval do not differ substantively from those just inside the interval. It is thus wrong to claim that an interval shows all possible values.

Second, not all values inside are equally compatible with the data, given the assumptions. The point estimate is the most compatible, and values near it are more compatible than those near the limits. This is why we urge authors to discuss the point estimate, even when they have a large P value or a wide interval, as well as discussing the limits of that interval. For example, the authors above could have written: ‘Like a previous study, our results suggest a 20% increase in risk of new-onset atrial fibrillation in patients given the anti-inflammatory drugs. Nonetheless, a risk difference ranging from a 3% decrease, a small negative association, to a 48% increase, a substantial positive association, is also reasonably compatible with our data, given our assumptions.’ Interpreting the point estimate, while acknowledging its uncertainty, will keep you from making false declarations of ‘no difference’, and from making overconfident claims.

Third, like the 0.05 threshold from which it came, the default 95% used to compute intervals is itself an arbitrary convention. It is based on the false idea that there is a 95% chance that the computed interval itself contains the true value, coupled with the vague feeling that this is a basis for a confident decision. A different level can be justified, depending on the application. And, as in the anti-inflammatory-drugs example, interval estimates can perpetuate the problems of statistical significance when the dichotomization they impose is treated as a scientific standard.

Last, and most important of all, be humble: compatibility assessments hinge on the correctness of the statistical assumptions used to compute the interval. In practice, these assumptions are at best subject to considerable uncertainty7,8,10. Make these assumptions as clear as possible and test the ones you can, for example by plotting your data and by fitting alternative models, and then reporting all results.

Whatever the statistics show, it is fine to suggest reasons for your results, but discuss a range of potential explanations, not just favoured ones. Inferences should be scientific, and that goes far beyond the merely statistical. Factors such as background evidence, study design, data quality and understanding of underlying mechanisms are often more important than statistical measures such as P values or intervals.

The objection we hear most against retiring statistical significance is that it is needed to make yes-or-no decisions. But for the choices often required in regulatory, policy and business environments, decisions based on the costs, benefits and likelihoods of all potential consequences always beat those made based solely on statistical significance. Moreover, for decisions about whether to pursue a research idea further, there is no simple connection between a P value and the probable results of subsequent studies.

What will retiring statistical significance look like? We hope that methods sections and data tabulation will be more detailed and nuanced. Authors will emphasize their estimates and the uncertainty in them — for example, by explicitly discussing the lower and upper limits of their intervals. They will not rely on significance tests. When P values are reported, they will be given with sensible precision (for example, P = 0.021 or P = 0.13) — without adornments such as stars or letters to denote statistical significance and not as binary inequalities (P  < 0.05 or P > 0.05). Decisions to interpret or to publish results will not be based on statistical thresholds. People will spend less time with statistical software, and more time thinking.

Our call to retire statistical significance and to use confidence intervals as compatibility intervals is not a panacea. Although it will eliminate many bad practices, it could well introduce new ones. Thus, monitoring the literature for statistical abuses should be an ongoing priority for the scientific community. But eradicating categorization will help to halt overconfident claims, unwarranted declarations of ‘no difference’ and absurd statements about ‘replication failure’ when the results from the original and replication studies are highly compatible. The misuse of statistical significance has done much harm to the scientific community and those who rely on scientific advice. P values, intervals and other statistical measures all have their place, but it’s time for statistical significance to go.

Nature 567, 305-307 (2019)

doi: https://doi.org/10.1038/d41586-019-00857-9

## References

1. 1.Fisher, R. A. Nature 136, 474 (1935).Article Google Scholar
2. 2.Schmidt, M. & Rothman, K. J. Int. J. Cardiol. 177, 1089–1090 (2014).PubMed Article Google Scholar
3. 3.Wasserstein, R. L., Schirm, A. & Lazar, N. A. Am. Stat. https://doi.org/10.1080/00031305.2019.1583913 (2019).Article Google Scholar
4. 4.Hurlbert, S. H., Levine, R. A. & Utts, J. Am. Stat. https://doi.org/10.1080/00031305.2018.1543616 (2019).Article Google Scholar
5. 5.Lehmann, E. L. Testing Statistical Hypotheses 2nd edn 70–71 (Springer, 1986).Google Scholar
6. 6.Gigerenzer, G. Adv. Meth. Pract. Psychol. Sci. 1, 198–218 (2018).Article Google Scholar
7. 7.Greenland, S. Am. J. Epidemiol. 186, 639–645 (2017).PubMed Article Google Scholar
8. 8.McShane, B. B., Gal, D., Gelman, A., Robert, C. & Tackett, J. L. Am. Stat. https://doi.org/10.1080/00031305.2018.1527253 (2019).Article Google Scholar
9. 9.Gelman, A. & Loken, E. Am. Sci. 102, 460–465 (2014).Article Google Scholar
10. 10.Amrhein, V., Trafimow, D. & Greenland, S. Am. Stat. https://doi.org/10.1080/00031305.2018.1543137 (2019).Article Google Scholar

### SUPPLEMENTARY INFORMATION

1. Supp info for Amrhein et al Comment_data V2

## Subjects

An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday.Email addressYes! Sign me up to receive the daily Nature Briefing email. I agree my information will be processed in accordance with the Nature and Springer Nature Limited Privacy Policy.Sign upClose

Nature (Nature) ISSN 1476-4687 (online) ISSN 0028-0836 (print)

## nature.com sitemap

### Regional websites

Subscribe

1. nature
2. editorials
3. article
• EDITORIAL
• 20 March 2019

# It’s time to talk about ditching statistical significance

Looking beyond a much used and abused measure would make science harder, but better.

•
•

Fans of The Hitchhiker’s Guide to the Galaxy know that the answer to life, the Universe and everything is 42. The joke, of course, is that truth cannot be revealed by a single number.

And yet this is the job often assigned to values: a measure of how surprising a result is, given assumptions about an experiment, including that no effect exists. Whether a P value falls above or below an arbitrary threshold demarcating ‘statistical significance’ (such as 0.05) decides whether hypotheses are accepted, papers are published and products are brought to market. But using P values as the sole arbiter of what to accept as truth can also mean that some analyses are biased, some false positives are overhyped and some genuine effects are overlooked.Scientists rise up against statistical significance

Change is in the air. In a Comment in this week’s issue, three statisticians call for scientists to abandon statistical significance. The authors do not call for P values themselves to be ditched as a statistical tool — rather, they want an end to their use as an arbitrary threshold of significance. More than 800 researchers have added their names as signatories. A series of related articles is being published by the American Statistical Association this week (R. L. Wasserstein et al. Am. Stat. https://doi.org/10.1080/00031305.2019.1583913; 2019). “The tool has become the tyrant,” laments one article.

Statistical significance is so deeply integrated into scientific practice and evaluation that extricating it would be painful. Critics will counter that arbitrary gatekeepers are better than unclear ones, and that the more useful argument is over which results should count for (or against) evidence of effect. There are reasonable viewpoints on all sides; Nature is not seeking to change how it considers statistical analysis in evaluation of papers at this time, but we encourage readers to share their views (see go.nature.com/correspondence).

If researchers do discard statistical significance, what should they do instead? They can start by educating themselves about statistical misconceptions. Most important will be the courage to consider uncertainty from multiple angles in every study. Logic, background knowledge and experimental design should be considered alongside values and similar metrics to reach a conclusion and decide on its certainty.

When working out which methods to use, researchers should also focus as much as possible on actual problems. People who will duel to the death over abstract theories on the best way to use statistics often agree on results when they are presented with concrete scenarios.

Researchers should seek to analyse data in multiple ways to see whether different analyses converge on the same answer. Projects that have crowdsourced analyses of a data set to diverse teams suggest that this approach can work to validate findings and offer new insights.

In short, be sceptical, pick a good question, and try to answer it in many ways. It takes many numbers to get close to the truth.

Nature 567, 283 (2019)

doi: https://doi.org/10.1038/d41586-019-00874-8

## Subjects

An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday.Email addressYes! Sign me up to receive the daily Nature Briefing email. I agree my information will be processed in accordance with the Nature and Springer Nature Limited Privacy Policy.Sign upClose

Nature (Nature) ISSN 1476-4687 (online) ISSN 0028-0836 (print)

## nature.com sitemap

### Regional websites

Share icon

Search in:  This Journal   Anywhere   Advanced search

# Moving to a World Beyond “p < 0.05”

Ronald L. Wasserstein,Allen L. Schirm &Nicole A. LazarPages 1-19 | Published online: 20 Mar 2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.

March 16, 2019

Some of you exploring this special issue of The American Statistician might be wondering if it’s a scolding from pedantic statisticians lecturing you about what not to do with p-values, without offering any real ideas of what to do about the very hard problem of separating signal from noise in data and making decisions under uncertainty. Fear not. In this issue, thanks to 43 innovative and thought-provoking papers from forward-looking statisticians, help is on the way.

## 1 “Don’t” Is Not Enough

There’s not much we can say here about the perils of p-values and significance testing that hasn’t been said already for decades (Ziliak and McCloskey 2008; Hubbard 2016). If you’re just arriving to the debate, here’s a sampling of what not to do:

• Don’t base your conclusions solely on whether an association or effect was found to be “statistically significant” (i.e., the p-value passed some arbitrary threshold such as p < 0.05).
• Don’t believe that an association or effect exists just because it was statistically significant.
• Don’t believe that an association or effect is absent just because it was not statistically significant.
• Don’t believe that your p-value gives the probability that chance alone produced the observed association or effect or the probability that your test hypothesis is true.
• Don’t conclude anything about scientific or practical importance based on statistical significance (or lack thereof).

Don’t. Don’t. Just…don’t. Yes, we talk a lot about don’ts. The ASA Statement on p-Values and Statistical Significance (Wasserstein and Lazar 2016) was developed primarily because after decades, warnings about the don’ts had gone mostly unheeded. The statement was about what not to do, because there is widespread agreement about the don’ts.

Knowing what not to do with p-values is indeed necessary, but it does not suffice. It is as though statisticians were asking users of statistics to tear out the beams and struts holding up the edifice of modern scientific research without offering solid construction materials to replace them. Pointing out old, rotting timbers was a good start, but now we need more.

Recognizing this, in October 2017, the American Statistical Association (ASA) held the Symposium on Statistical Inference, a two-day gathering that laid the foundations for this special issue of The American Statistician. Authors were explicitly instructed to develop papers for the variety of audiences interested in these topics. If you use statistics in research, business, or policymaking but are not a statistician, these articles were indeed written with YOU in mind. And if you are a statistician, there is still much here for you as well.

The papers in this issue propose many new ideas, ideas that in our determination as editors merited publication to enable broader consideration and debate. The ideas in this editorial are likewise open to debate. They are our own attempt to distill the wisdom of the many voices in this issue into an essence of good statistical practice as we currently see it: some do’s for teaching, doing research, and informing decisions.

Yet the voices in the 43 papers in this issue do not sing as one. At times in this editorial and the papers you’ll hear deep dissonance, the echoes of “statistics wars” still simmering today (Mayo 2018). At other times you’ll hear melodies wrapping in a rich counterpoint that may herald an increasingly harmonious new era of statistics. To us, these are all the sounds of statistical inference in the 21st century, the sounds of a world learning to venture beyond “p < 0.05.”

This is a world where researchers are free to treat “p = 0.051” and “p = 0.049” as not being categorically different, where authors no longer find themselves constrained to selectively publish their results based on a single magic number. In this world, where studies with “p < 0.05” and studies with “p > 0.05” are not automatically in conflict, researchers will see their results more easily replicated—and, even when not, they will better understand why. As we venture down this path, we will begin to see fewer false alarms, fewer overlooked discoveries, and the development of more customized statistical strategies. Researchers will be free to communicate all their findings in all their glorious uncertainty, knowing their work is to be judged by the quality and effective communication of their science, and not by their p-values. As “statistical significance” is used less, statistical thinking will be used more.

The ASA Statement on P-Values and Statistical Significance started moving us toward this world. As of the date of publication of this special issue, the statement has been viewed over 294,000 times and cited over 1700 times—an average of about 11 citations per week since its release. Now we must go further. That’s what this special issue of The American Statistician sets out to do.

To get to the do’s, though, we must begin with one more don’t.

## 2 Don’t Say “Statistically Significant”

The ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of “statistical significance” be abandoned. We take that step here. We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as “significantly different,” “p < 0.05,” and “nonsignificant” survive, whether expressed in words, by asterisks in a table, or in some other way.

Regardless of whether it was ever useful, a declaration of “statistical significance” has today become meaningless. Made broadly known by Fisher’s use of the phrase (1925), Edgeworth’s (1885) original intention for statistical significance was simply as a tool to indicate when a result warrants further scrutiny. But that idea has been irretrievably lost. Statistical significance was never meant to imply scientific importance, and the confusion of the two was decried soon after its widespread use (Boring 1919). Yet a full century later the confusion persists.

And so the tool has become the tyrant. The problem is not simply use of the word “significant,” although the statistical and ordinary language meanings of the word are indeed now hopelessly confused (Ghose 2013); the term should be avoided for that reason alone. The problem is a larger one, however: using bright-line rules for justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision making (ASA statement, Principle 3). A label of statistical significance adds nothing to what is already conveyed by the value of p; in fact, this dichotomization of p-values makes matters worse.

For example, no p-value can reveal the plausibility, presence, truth, or importance of an association or effect. Therefore, a label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important. Nor does a label of statistical nonsignificance lead to the association or effect being improbable, absent, false, or unimportant. Yet the dichotomization into “significant” and “not significant” is taken as an imprimatur of authority on these characteristics. In a world without bright lines, on the other hand, it becomes untenable to assert dramatic differences in interpretation from inconsequential differences in estimates. As Gelman and Stern (2006) famously observed, the difference between “significant” and “not significant” is not itself statistically significant.

Furthermore, this false split into “worthy” and “unworthy” results leads to the selective reporting and publishing of results based on their statistical significance—the so-called “file drawer problem” (Rosenthal 1979). And the dichotomized reporting problem extends beyond just publication, notes Amrhein, Trafimow, and Greenland (2019): when authors use p-value thresholds to select which findings to discuss in their papers, “their conclusions and what is reported in subsequent news and reviews will be biased…Such selective attention based on study outcomes will therefore not only distort the literature but will slant published descriptions of study results—biasing the summary descriptions reported to practicing professionals and the general public.” For the integrity of scientific publishing and research dissemination, therefore, whether a p-value passes any arbitrary threshold should not be considered at all when deciding which results to present or highlight.

To be clear, the problem is not that of having only two labels. Results should not be trichotomized, or indeed categorized into any number of groups, based on arbitrary p-value thresholds. Similarly, we need to stop using confidence intervals as another means of dichotomizing (based, on whether a null value falls within the interval). And, to preclude a reappearance of this problem elsewhere, we must not begin arbitrarily categorizing other statistical measures (such as Bayes factors).

Despite the limitations of p-values (as noted in Principles 5 and 6 of the ASA statement), however, we are not recommending that the calculation and use of continuous p-values be discontinued. Where p-values are used, they should be reported as continuous quantities (e.g., p = 0.08). They should also be described in language stating what the value means in the scientific context. We believe that a reasonable prerequisite for reporting any p-value is the ability to interpret it appropriately. We say more about this in Section 3.3.

To move forward to a world beyond “p < 0.05,” we must recognize afresh that statistical inference is not—and never has been—equivalent to scientific inference (Hubbard, Haig, and Parsa 2019; Ziliak 2019). However, looking to statistical significance for a marker of scientific observations’ credibility has created a guise of equivalency. Moving beyond “statistical significance” opens researchers to the real significance of statistics, which is “the science of learning from data, and of measuring, controlling, and communicating uncertainty” (Davidian and Louis 2012).

In sum, “statistically significant”—don’t say it and don’t use it.

## 3 There Are Many Do’s

With the don’ts out of the way, we can finally discuss ideas for specific, positive, constructive actions. We have a massive list of them in the seventh section of this editorial! In that section, the authors of all the articles in this special issue each provide their own short set of do’s. Those lists, and the rest of this editorial, will help you navigate the substantial collection of articles that follows.

Because of the size of this collection, we take the liberty here of distilling our readings of the articles into a summary of what can be done to move beyond “p < 0.05.” You will find the rich details in the articles themselves.

What you will NOT find in this issue is one solution that majestically replaces the outsized role that statistical significance has come to play. The statistical community has not yet converged on a simple paradigm for the use of statistical inference in scientific research—and in fact it may never do so. A one-size-fits-all approach to statistical inference is an inappropriate expectation, even after the dust settles from our current remodeling of statistical practice (Tong 2019). Yet solid principles for the use of statistics do exist, and they are well explained in this special issue.

We summarize our recommendations in two sentences totaling seven words: “Accept uncertainty. Be thoughtful, open, and modest.” Remember “ATOM.”

### 3.1 Accept Uncertainty

Uncertainty exists everywhere in research. And, just like with the frigid weather in a Wisconsin winter, there are those who will flee from it, trying to hide in warmer havens elsewhere. Others, however, accept and even delight in the omnipresent cold; these are the ones who buy the right gear and bravely take full advantage of all the wonders of a challenging climate. Significance tests and dichotomized p-values have turned many researchers into scientific snowbirds, trying to avoid dealing with uncertainty by escaping to a “happy place” where results are either statistically significant or not. In the real world, data provide a noisy signal. Variation, one of the causes of uncertainty, is everywhere. Exact replication is difficult to achieve. So it is time to get the right (statistical) gear and “move toward a greater acceptance of uncertainty and embracing of variation” (Gelman 2016).

Statistical methods do not rid data of their uncertainty. “Statistics,” Gelman (2016) says, “is often sold as a sort of alchemy that transmutes randomness into certainty, an ‘uncertainty laundering’ that begins with data and concludes with success as measured by statistical significance.” To accept uncertainty requires that we “treat statistical results as being much more incomplete and uncertain than is currently the norm” (Amrhein, Trafimow, and Greenland 2019). We must “countenance uncertainty in all statistical conclusions, seeking ways to quantify, visualize, and interpret the potential for error” (Calin-Jageman and Cumming 2019).

“Accept uncertainty and embrace variation in effects,” advise McShane et al. in Section 7 of this editorial. “[W]e can learn much (indeed, more) about the world by forsaking the false promise of certainty offered by dichotomous declarations of truth or falsity—binary statements about there being ‘an effect’ or ‘no effect’—based on some p-value or other statistical threshold being attained.”

We can make acceptance of uncertainty more natural to our thinking by accompanying every point estimate in our research with a measure of its uncertainty such as a standard error or interval estimate. Reporting and interpreting point and interval estimates should be routine. However, simplistic use of confidence intervals as a measurement of uncertainty leads to the same bad outcomes as use of statistical significance (especially, a focus on whether such intervals include or exclude the “null hypothesis value”). Instead, Greenland (2019) and Amrhein, Trafimow, and Greenland (2019) encourage thinking of confidence intervals as “compatibility intervals,” which use p-values to show the effect sizes that are most compatible with the data under the given model.

How will accepting uncertainty change anything? To begin, it will prompt us to seek better measures, more sensitive designs, and larger samples, all of which increase the rigor of research. It also helps us be modest (the fourth of our four principles, on which we will expand in Section 3.4) and encourages “meta-analytic thinking” (Cumming 2014). Accepting uncertainty as inevitable is a natural antidote to the seductive certainty falsely promised by statistical significance. With this new outlook, we will naturally seek out replications and the integration of evidence through meta-analyses, which usually requires point and interval estimates from contributing studies. This will in turn give us more precise overall estimates for our effects and associations. And this is what will lead to the best research-based guidance for practical decisions.

Accepting uncertainty leads us to be thoughtful, the second of our four principles.

### 3.2 Be Thoughtful

What do we mean by this exhortation to “be thoughtful”? Researchers already clearly put much thought into their work. We are not accusing anyone of laziness. Rather, we are envisioning a sort of “statistical thoughtfulness.” In this perspective, statistically thoughtful researchers begin above all else with clearly expressed objectives. They recognize when they are doing exploratory studies and when they are doing more rigidly pre-planned studies. They invest in producing solid data. They consider not one but a multitude of data analysis techniques. And they think about so much more.

#### 3.2.1 Thoughtfulness in the Big Picture

“[M]ost scientific research is exploratory in nature,” Tong (2019) contends. “[T]he design, conduct, and analysis of a study are necessarily flexible, and must be open to the discovery of unexpected patterns that prompt new questions and hypotheses. In this context, statistical modeling can be exceedingly useful for elucidating patterns in the data, and researcher degrees of freedom can be helpful and even essential, though they still carry the risk of overfitting. The price of allowing this flexibility is that the validity of any resulting statistical inferences is undermined.”

Calin-Jageman and Cumming (2019) caution that “in practice the dividing line between planned and exploratory research can be difficult to maintain. Indeed, exploratory findings have a slippery way of ‘transforming’ into planned findings as the research process progresses.” At the bottom of that slippery slope one often finds results that don’t reproduce.

Anderson (2019) proposes three questions thoughtful researchers asked thoughtful researchers evaluating research results: What are the practical implications of the estimate? How precise is the estimate? And is the model correctly specified? The latter question leads naturally to three more: Are the modeling assumptions understood? Are these assumptions valid? And do the key results hold up when other modeling choices are made? Anderson further notes, “Modeling assumptions (including all the choices from model specification to sample selection and the handling of data issues) should be sufficiently documented so independent parties can critique, and replicate, the work.”

Drawing on archival research done at the Guinness Archives in Dublin, Ziliak (2019) emerges with ten “G-values” he believes we all wish to maximize in research. That is, we want large G-values, not small p-values. The ten principles of Ziliak’s “Guinnessometrics” are derived primarily from his examination of experiments conducted by statistician William Sealy Gosset while working as Head Brewer for Guinness. Gosset took an economic approach to the logic of uncertainty, preferring balanced designs over random ones and estimation of gambles over bright-line “testing.” Take, for example, Ziliak’s G-value 10: “Consider purpose of the inquiry, and compare with best practice,” in the spirit of what farmers and brewers must do. The purpose is generally NOT to falsify a null hypothesis, says Ziliak. Ask what is at stake, he advises, and determine what magnitudes of change are humanly or scientifically meaningful in context.

Pogrow (2019) offers an approach based on practical benefit rather than statistical or practical significance. This approach is especially useful, he says, for assessing whether interventions in complex organizations (such as hospitals and schools) are effective, and also for increasing the likelihood that the observed benefits will replicate in subsequent research and in clinical practice. In this approach, “practical benefit” recognizes that reliance on small effect sizes can be as problematic as relying on p-values.

Thoughtful research prioritizes sound data production by putting energy into the careful planning, design, and execution of the study (Tong 2019).

Locascio (2019) urges researchers to be prepared for a new publishing model that evaluates their research based on the importance of the questions being asked and the methods used to answer them, rather than the outcomes obtained.

#### 3.2.2 Thoughtfulness Through Context and Prior Knowledge

Thoughtful research considers the scientific context and prior evidence. In this regard, a declaration of statistical significance is the antithesis of thoughtfulness: it says nothing about practical importance, and it ignores what previous studies have contributed to our knowledge.

Thoughtful research looks ahead to prospective outcomes in the context of theory and previous research. Researchers would do well to ask, What do we already know, and how certain are we in what we know? And building on that and on the field’s theory, what magnitudes of differences, odds ratios, or other effect sizes are practically important? These questions would naturally lead a researcher, for example, to use existing evidence from a literature review to identify specifically the findings that would be practically important for the key outcomes under study.

Thoughtful research includes careful consideration of the definition of a meaningful effect size. As a researcher you should communicate this up front, before data are collected and analyzed. Afterwards is just too late; it is dangerously easy to justify observed results after the fact and to overinterpret trivial effect sizes as being meaningful. Many authors in this special issue argue that consideration of the effect size and its “scientific meaningfulness” is essential for reliable inference (e.g., Blume et al. 2019; Betensky 2019). This concern is also addressed in the literature on equivalence testing (Wellek 2017).

Thoughtful research considers “related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain…without giving priority to p-values or other purely statistical measures” (McShane et al. 2019).

Thoughtful researchers “use a toolbox of statistical techniques, employ good judgment, and keep an eye on developments in statistical and data science,” conclude Heck and Krueger ((2019)), who demonstrate how the p-value can be useful to researchers as a heuristic.

#### 3.2.3 Thoughtful Alternatives and Complements to P-Values

Thoughtful research considers multiple approaches for solving problems. This special issue includes some ideas for supplementing or replacing p-values. Here is a short summary of some of them, with a few technical details:

Amrhein, Trafimow, and Greenland (2019) and Greenland (2019) advise that null p-values should be supplemented with a p-value from a test of a pre-specified alternative (such as a minimal important effect size). To reduce confusion with posterior probabilities and better portray evidential value, they further advise that p-values be transformed into s-values (Shannon information, surprisal, or binary logworth) s = – log2(p). This measure of evidence affirms other arguments that the evidence against a hypothesis contained in the p-value is not nearly as strong as is believed by many researchers. The change of scale also moves users away from probability misinterpretations of the p-value.

Blume et al. (2019) offer a “second generation p-value (SGPV),” the characteristics of which mimic or improve upon those of p-values but take practical significance into account. The null hypothesis from which an SGPV is computed is a composite hypothesis representing a range of differences that would be practically or scientifically inconsequential, as in equivalence testing (Wellek 2017). This range is determined in advance by the experimenters. When the SGPV is 1, the data only support null hypotheses; when the SGPV is 0, the data are incompatible with any of the null hypotheses. SGPVs between 0 and 1 are inconclusive at varying levels (maximally inconclusive at or near SGPV = 0.5.) Blume et al. illustrate how the SGPV provides a straightforward and useful descriptive summary of the data. They argue that it eliminates the problem of how classical statistical significance does not imply scientific relevance, it lowers false discovery rates, and its conclusions are more likely to reproduce in subsequent studies.

The “analysis of credibility”(AnCred) is promoted by Matthews (2019). This approach takes account of both the width of the confidence interval and the location of its bounds when assessing weight of evidence. AnCred assesses the credibility of inferences based on the confidence interval by determining the level of prior evidence needed for a new finding to provide credible evidence for a nonzero effect. If this required level of prior evidence is supported by current knowledge and insight, Matthews calls the new result “credible evidence for a non-zero effect,” irrespective of its statistical significance/nonsignificance.

Colquhoun (2019) proposes continuing the use of continuous p-values, but only in conjunction with the “false positive risk (FPR).” The FPR answers the question, “If you observe a ‘significant’ p-value after doing a single unbiased experiment, what is the probability that your result is a false positive?” It tells you what most people mistakenly still think the p-value does, Colquhoun says. The problem, however, is that to calculate the FPR you need to specify the prior probability that an effect is real, and it’s rare to know this. Colquhoun suggests that the FPR could be calculated with a prior probability of 0.5, the largest value reasonable to assume in the absence of hard prior data. The FPR found this way is in a sense the minimum false positive risk (mFPR); less plausible hypotheses (prior probabilities below 0.5) would give even bigger FPRs, Colquhoun says, but the mFPR would be a big improvement on reporting a p-value alone. He points out that p-values near 0.05 are, under a variety of assumptions, associated with minimum false positive risks of 20–30%, which should stop a researcher from making too big a claim about the “statistical significance” of such a result.

Benjamin and Berger (2019) propose a different supplement to the null p-value. The Bayes factor bound (BFB)—which under typically plausible assumptions is the value 1/(-ep ln p)—represents the upper bound of the ratio of data-based odds of the alternative hypothesis to the null hypothesis. Benjamin and Berger advise that the BFB should be reported along with the continuous p-value. This is an incomplete step toward revising practice, they argue, but one that at least confronts the researcher with the maximum possible odds that the alternative hypothesis is true—which is what researchers often think they are getting with a p-value. The BFB, like the FPR, often clarifies that the evidence against the null hypothesis contained in the p-value is not nearly as strong as is believed by many researchers.

Goodman, Spruill, and Komaroff (2019) propose a two-stage approach to inference, requiring both a small p-value below a pre-specified level and a pre-specified sufficiently large effect size before declaring a result “significant.” They argue that this method has improved performance relative to use of dichotomized p-values alone.

Gannon, Pereira, and Polpo (2019) have developed a testing procedure combining frequentist and Bayesian tools to provide a significance level that is a function of sample size.

Manski (2019) and Manski and Tetenov (2019) urge a return to the use of statistical decision theory, which they say has largely been forgotten. Statistical decision theory is not based on p-value thresholds and readily distinguishes between statistical and clinical significance.

Billheimer (2019) suggests abandoning inference about parameters, which are frequently hypothetical quantities used to idealize a problem. Instead, he proposes focusing on the prediction of future observables, and their associated uncertainty, as a means to improving science and decision-making.

#### 3.2.4 Thoughtful Communication of Confidence

Be thoughtful and clear about the level of confidence or credibility that is present in statistical results.

Amrhein, Trafimow, and Greenland (2019) and Greenland (2019) argue that the use of words like “significance” in conjunction with p-values and “confidence” with interval estimates misleads users into overconfident claims. They propose that researchers think of p-values as measuring the compatibility between hypotheses and data, and interpret interval estimates as “compatibility intervals.”

In what may be a controversial proposal, Goodman (2018) suggests requiring “that any researcher making a claim in a study accompany it with their estimate of the chance that the claim is true.” Goodman calls this the confidence index. For example, along with stating “This drug is associated with elevated risk of a heart attack, relative risk (RR) = 2.4, p = 0.03,” Goodman says investigators might add a statement such as “There is an 80% chance that this drug raises the risk, and a 60% chance that the risk is at least doubled.” Goodman acknowledges, “Although simple on paper, requiring a confidence index would entail a profound overhaul of scientific and statistical practice.”

In a similar vein, Hubbard and Carriquiry (2019) urge that researchers prominently display the probability the hypothesis is true or a probability distribution of an effect size, or provide sufficient information for future researchers and policy makers to compute it. The authors further describe why such a probability is necessary for decision making, how it could be estimated by using historical rates of reproduction of findings, and how this same process can be part of continuous “quality control” for science.

Being thoughtful in our approach to research will lead us to be open in our design, conduct, and presentation of it as well.

### 3.3 Be Open

We envision openness as embracing certain positive practices in the development and presentation of research work.

#### 3.3.1 Openness to Transparency and to the Role of Expert Judgment

First, we repeat oft-repeated advice: Be open to “open science” practices. Calin-Jageman and Cumming (2019), Locascio (2019), and others in this special issue urge adherence to practices such as public pre-registration of methods, transparency and completeness in reporting, shared data and code, and even pre-registered (“results-blind”) review. Completeness in reporting, for example, requires not only describing all analyses performed but also presenting all findings obtained, without regard to statistical significance or any such criterion.

Openness also includes understanding and accepting the role of expert judgment, which enters the practice of statistical inference and decision-making in numerous ways (O’Hagan 2019). “Indeed, there is essentially no aspect of scientific investigation in which judgment is not required,” O’Hagan observes. “Judgment is necessarily subjective, but should be made as carefully, as objectively, and as scientifically as possible.”

Subjectivity is involved in any statistical analysis, Bayesian or frequentist. Gelman and Hennig (2017) observe, “Personal decision making cannot be avoided in statistical data analysis and, for want of approaches to justify such decisions, the pursuit of objectivity degenerates easily to a pursuit to merely appear objective.” One might say that subjectivity is not a problem; it is part of the solution.

Acknowledging this, Brownstein et al. (2019) point out that expert judgment and knowledge are required in all stages of the scientific method. They examine the roles of expert judgment throughout the scientific process, especially regarding the integration of statistical and content expertise. “All researchers, irrespective of their philosophy or practice, use expert judgment in developing models and interpreting results,” say Brownstein et al. “We must accept that there is subjectivity in every stage of scientific inquiry, but objectivity is nevertheless the fundamental goal. Therefore, we should base judgments on evidence and careful reasoning, and seek wherever possible to eliminate potential sources of bias.”

How does one rigorously elicit expert knowledge and judgment in an effective, unbiased, and transparent way? O’Hagan (2019) addresses this, discussing protocols to elicit expert knowledge in an unbiased and as scientifically sound was as possible. It is also important for such elicited knowledge to be examined critically, comparing it to actual study results being an important diagnostic step.

#### 3.3.2 Openness in Communication

Be open in your reporting. Report p-values as continuous, descriptive statistics, as we explain in Section 2. We realize that this leaves researchers without their familiar bright line anchors. Yet if we were to propose a universal template for presenting and interpreting continuous p-values we would violate our own principles! Rather, we believe that the thoughtful use and interpretation of p-values will never adhere to a rigid rulebook, and will instead inevitably vary from study to study. Despite these caveats, we can offer recommendations for sound practices, as described below.

In all instances, regardless of the value taken by p or any other statistic, consider what McShane et al. (2019) call the “currently subordinate factors”—the factors that should no longer be subordinate to “p < 0.05.” These include relevant prior evidence, plausibility of mechanism, study design and data quality, and the real-world costs and benefits that determine what effects are scientifically important. The scientific context of your study matters, they say, and this should guide your interpretation.

When using p-values, remember not only Principle 5 of the ASA statement: “A p-value…does not measure the size of an effect or the importance of a result” but also Principle 6: “By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.” Despite these limitations, if you present p-values, do so for more than one hypothesized value of your variable of interest (Fraser 2019; Greenland 2019), such as 0 and at least one plausible, relevant alternative, such as the minimum practically important effect size (which should be determined before analyzing the data).

Betensky (2019) also reminds us to interpret the p-value in the context of sample size and meaningful effect size.

Instead of p, you might consider presenting the s-value (Greenland 2019), which is described in Section 3.2. As noted in Section 3.1, you might present a confidence interval. Sound practices in the interpretation of confidence intervals include (1) discussing both the upper and lower limits and whether they have different practical implications, (2) paying no particular attention to whether the interval includes the null value, and (3) remembering that an interval is itself an estimate subject to error and generally provides only a rough indication of uncertainty given that all of the assumptions used to create it are correct and, thus, for example, does not “rule out” values outside the interval. Amrhein, Trafimow, and Greenland (2019) suggest that interval estimates be interpreted as “compatibility” intervals rather than as “confidence” intervals, showing the values that are most compatible with the data, under the model used to compute the interval. They argue that such an interpretation and the practices outlined here can help guard against overconfidence.

It is worth noting that Tong (2019) disagrees with using p-values as descriptive statistics. “Divorced from the probability claims attached to such quantities (confidence levels, nominal Type I errors, and so on), there is no longer any reason to privilege such quantities over descriptive statistics that more directly characterize the data at hand.” He further states, “Methods with alleged generality, such as the p-value or Bayes factor, should be avoided in favor of discipline- and problem-specific solutions that can be designed to be fit for purpose.”

Failing to be open in reporting leads to publication bias. Ioannidis (2019) notes the high level of selection bias prevalent in biomedical journals. He defines “selection” as “the collection of choices that lead from the planning of a study to the reporting of p-values.” As an illustration of one form of selection bias, Ioannidis compared “the set of p-values reported in the full text of an article with the set of p-values reported in the abstract.” The main finding, he says, “was that p-values chosen for the abstract tended to show greater significance than those reported in the text, and that the gradient was more pronounced in some types of journals and types of designs.” Ioannidis notes, however, that selection bias “can be present regardless of the approach to inference used.” He argues that in the long run, “the only direct protection must come from standards for reproducible research.”

To be open, remember that one study is rarely enough. The words “a groundbreaking new study” might be loved by news writers but must be resisted by researchers. Breaking ground is only the first step in building a house. It will be suitable for habitation only after much more hard work.

Be open by providing sufficient information so that other researchers can execute meaningful alternative analyses. van Dongen et al. (2019) provide an illustrative example of such alternative analyses by different groups attacking the same problem.

Being open goes hand in hand with being modest.

### 3.4 Be Modest

Researchers of any ilk may rarely advertise their personal modesty. Yet the most successful ones cultivate a practice of being modest throughout their research, by understanding and clearly expressing the limitations of their work.

Being modest requires a reality check (Amrhein, Trafimow, and Greenland 2019). “A core problem,” they observe, “is that both scientists and the public confound statistics with reality. But statistical inference is a thought experiment, describing the predictive performance of models about reality. Of necessity, these models are extremely simplified relative to the complexities of actual study conduct and of the reality being studied. Statistical results must eventually mislead us when they are used and communicated as if they present this complex reality, rather than a model for it. This is not a problem of our statistical methods. It is a problem of interpretation and communication of results.”

Be modest in recognizing there is not a “true statistical model” underlying every problem, which is why it is wise to thoughtfully consider many possible models (Lavine 2019). Rougier (2019) calls on researchers to “recognize that behind every choice of null distribution and test statistic, there lurks a plausible family of alternative hypotheses, which can provide more insight into the null distribution.”p-values, confidence intervals, and other statistical measures are all uncertain. Treating them otherwise is immodest overconfidence.

Remember that statistical tools have their limitations. Rose and McGuire (2019) show how use of stepwise regression in health care settings can lead to policies that are unfair.

Remember also that the amount of evidence for or against a hypothesis provided by p-values near the ubiquitous p < 0.05 threshold (Johnson 2019) is usually much less than you think (Benjamin and Berger 2019; Colquhoun 2019; Greenland 2019).

Be modest about the role of statistical inference in scientific inference. “Scientific inference is a far broader concept than statistical inference,” says Hubbard, Haig, and Parsa (2019). “A major focus of scientific inference can be viewed as the pursuit of significant sameness, meaning replicable and empirically generalizable results among phenomena. Regrettably, the obsession with users of statistical inference to report significant differences in data sets actively thwarts cumulative knowledge development.”

The nexus of openness and modesty is to report everything while at the same time not concluding anything from a single study with unwarranted certainty. Because of the strong desire to inform and be informed, there is a relentless demand to state results with certainty. Again, accept uncertainty and embrace variation in associations and effects, because they are always there, like it or not. Understand that expressions of uncertainty are themselves uncertain. Accept that one study is rarely definitive, so encourage, sponsor, conduct, and publish replication studies. Then, use meta-analysis, evidence reviews, and Bayesian methods to synthesize evidence across studies.

Resist the urge to overreach in the generalizability of claims, Watch out for pressure to embellish the abstract or the press release. If the study’s limitations are expressed in the paper but not in the abstract, they may never be read.

Be modest by encouraging others to reproduce your work. Of course, for it to be reproduced readily, you will necessarily have been thoughtful in conducting the research and open in presenting it.

Hubbard and Carriquiry (see their “do list” in Section 7) suggest encouraging reproduction of research by giving “a byline status for researchers who reproduce studies.” They would like to see digital versions of papers dynamically updated to display “Reproduced by….” below original research authors’ names or “not yet reproduced” until it is reproduced.

Indeed, when it comes to reproducibility, Amrhein, Trafimow, and Greenland (2019) demand that we be modest in our expectations. “An important role for statistics in research is the summary and accumulation of information,” they say. “If replications do not find the same results, this is not necessarily a crisis, but is part of a natural process by which science evolves. The goal of scientific methodology should be to direct this evolution toward ever more accurate descriptions of the world and how it works, not toward ever more publication of inferences, conclusions, or decisions.”

Referring to replication studies in psychology, McShane et al. (2019) recommend that future large-scale replication projects “should follow the ‘one phenomenon, many studies’ approach of the Many Labs project and Registered Replication Reports rather than the ‘many phenomena, one study’ approach of the Open Science Collaboration project. In doing so, they should systematically vary method factors across the laboratories involved in the project.” This approach helps achieve the goals of Amrhein, Trafimow, and Greenland (2019) by increasing understanding of why and when results replicate or fail to do so, yielding more accurate descriptions of the world and how it works. It also speaks to significant sameness versus significant difference a la Hubbard, Haig, and Parsa (2019).

Kennedy-Shaffer’s (2019) historical perspective on statistical significance reminds us to be modest, by prompting us to recall how the current state of affairs in p-values has come to be.

Finally, be modest by recognizing that different readers may have very different stakes on the results of your analysis, which means you should try to take the role of a neutral judge rather than an advocate for any hypothesis. This can be done, for example, by pairing every null p-value with a p-value testing an equally reasonable alternative, and by discussing the endpoints of every interval estimate (not only whether it contains the null).

Accept that both scientific inference and statistical inference are hard, and understand that no knowledge will be efficiently advanced using simplistic, mechanical rules and procedures. Accept also that pure objectivity is an unattainable goal—no matter how laudable—and that both subjectivity and expert judgment are intrinsic to the conduct of science and statistics. Accept that there will always be uncertainty, and be thoughtful, open, and modest. ATOM.

And to push this acronym further, we argue in the next section that institutional change is needed, so we put forward that change is needed at the ATOMIC level. Let’s go.

## 4 Editorial, Educational and Other Institutional Practices Will Have to Change

Institutional reform is necessary for moving beyond statistical significance in any context—whether journals, education, academic incentive systems, or others. Several papers in this special issue focus on reform.

Goodman (2019) notes considerable social change is needed in academic institutions, in journals, and among funding and regulatory agencies. He suggests (see Section 7) partnering “with science reform movements and reformers within disciplines, journals, funding agencies and regulators to promote and reward ‘reproducible’ science and diminish the impact of statistical significance on publication, funding and promotion.” Similarly, Colquhoun (2019) says, “In the end, the only way to solve the problem of reproducibility is to do more replication and to reduce the incentives that are imposed on scientists to produce unreliable work. The publish-or-perish culture has damaged science, as has the judgment of their work by silly metrics.”

Trafimow (2019), who added energy to the discussion of p-values a few years ago by banning them from the journal he edits (Fricker et al. 2019), suggests five “nonobvious changes” to editorial practice. These suggestions, which demand reevaluating traditional practices in editorial policy, will not be trivial to implement but would result in massive change in some journals.

Locascio (20172019) suggests that evaluation of manu-scripts for publication should be “results-blind.” That is, manuscripts should be assessed for suitability for publication based on the substantive importance of the research without regard to their reported results. Kmetz (2019) supports this approach as well and says that it would be a huge benefit for reviewers, “freeing [them] from their often thankless present jobs and instead allowing them to review research designs for their potential to provide useful knowledge.” (See also “registered reports” from the Center for Open Science (https://cos.io/rr/?_ga=2.184185454.979594832.1547755516-1193527346.1457026171) and “registered replication reports” from the Association for Psychological Science (https://www.psychologicalscience.org/publications/replication) in relation to this concept.)

Amrhein, Trafimow, and Greenland (2019) ask if results-blind publishing means that anything goes, and then answer affirmatively: “Everything should be published in some form if whatever we measured made sense before we obtained the data because it was connected in a potentially useful way to some research question.” Journal editors, they say, “should be proud about [their] exhaustive methods sections” and base their decisions about the suitability of a study for publication “on the quality of its materials and methods rather than on results and conclusions; the quality of the presentation of the latter is only judged after it is determined that the study is valuable based on its materials and methods.”

A “variation on this theme is pre-registered replication, where a replication study, rather than the original study, is subject to strict pre-registration (e.g., Gelman 2015),” says Tong (2019). “A broader vision of this idea (Mogil and Macleod 2017) is to carry out a whole series of exploratory experiments without any formal statistical inference, and summarize the results by descriptive statistics (including graphics) or even just disclosure of the raw data. When results from this series of experiments converge to a single working hypothesis, it can then be subjected to a pre-registered, randomized and blinded, appropriately powered confirmatory experiment, carried out by another laboratory, in which valid statistical inference may be made.”

Hurlbert, Levine, and Utts (2019) urge abandoning the use of “statistically significant” in all its forms and encourage journals to provide instructions to authors along these lines: “There is now wide agreement among many statisticians who have studied the issue that for reporting of statistical tests yielding p-values it is illogical and inappropriate to dichotomize the p-scale and describe results as ‘significant’ and ‘nonsignificant.’ Authors are strongly discouraged from continuing this never justified practice that originated from confusions in the early history of modern statistics.”

Hurlbert, Levine, and Utts (2019) also urge that the ASA Statement on PValues and Statistical Significance “be sent to the editor-in-chief of every journal in the natural, behavioral and social sciences for forwarding to their respective editorial boards and stables of manuscript reviewers. That would be a good way to quickly improve statistical understanding and practice.” Kmetz (2019) suggests referring to the ASA statement whenever submitting a paper or revision to any editor, peer reviewer, or prospective reader. Hurlbert et al. encourage a “community grassroots effort” to encourage change in journal procedures.

Campbell and Gustafson (2019) propose a statistical model for evaluating publication policies in terms of weighing novelty of studies (and the likelihood of those studies subsequently being found false) against pre-specified study power. They observe that “no publication policy will be perfect. Science is inherently challenging and we must always be willing to accept that a certain proportion of research is potentially false.”

Statistics education will require major changes at all levels to move to a post “p < 0.05” world. Two papers in this special issue make a specific start in that direction (Maurer et al. 2019; Steel, Liermann, and Guttorp 2019), but we hope that volumes will be written on this topic in other venues. We are excited that, with support from the ASA, the US Conference on Teaching Statistics (USCOTS) will focus its 2019 meeting on teaching inference.

The change that needs to happen demands change to editorial practice, to the teaching of statistics at every level where inference is taught, and to much more. However…

## 5 It Is Going to Take Work, and It Is Going to Take Time

If it were easy, it would have already been done, because as we have noted, this is nowhere near the first time the alarm has been sounded.

Why is eliminating the use of p-values as a truth arbiter so hard? “The basic explanation is neither philosophical nor scientific, but sociologic; everyone uses them,” says Goodman (2019). “It’s the same reason we can use money. When everyone believes in something’s value, we can use it for real things; money for food, and p-values for knowledge claims, publication, funding, and promotion. It doesn’t matter if the p-value doesn’t mean what people think it means; it becomes valuable because of what it buys.”

Goodman observes that statisticians alone cannot address the problem, and that “any approach involving only statisticians will not succeed.” He calls on statisticians to ally themselves “both with scientists in other fields and with broader based, multidisciplinary scientific reform movements. What statisticians can do within our own discipline is important, but to effectively disseminate or implement virtually any method or policy, we need partners.”

“The loci of influence,” Goodman says, “include journals, scientific lay and professional media (including social media), research funders, healthcare payors, technology assessors, regulators, academic institutions, the private sector, and professional societies. They also can include policy or informational entities like the National Academies…as well as various other science advisory bodies across the government. Increasingly, they are also including non-traditional science reform organizations comprised both of scientists and of the science literate lay public…and a broad base of health or science advocacy groups…”

It is no wonder, then, that the problem has persisted for so long. And persist it has! Hubbard (2019) looked at citation-count data on twenty-five articles and books severely critical of the effect of null hypothesis significance testing (NHST) on good science. Though issues were well known, Hubbard says, this did nothing to stem NHST usage over time.

Greenland (personal communication, January 25, 2019) notes that cognitive biases and perverse incentives to offer firm conclusions where none are warranted can warp the use of any method. “The core human and systemic problems are not addressed by shifting blame to p-values and pushing alternatives as magic cures—especially alternatives that have been subject to little or no comparative evaluation in either classrooms or practice,” Greenland said. “What we need now is to move beyond debating only our methods and their interpretations, to concrete proposals for elimination of systemic problems such as pressure to produce noteworthy findings rather than to produce reliable studies and analyses. Review and provisional acceptance of reports before their results are given to the journal (Locascio 2019) is one way to address that pressure, but more ideas are needed since review of promotions and funding applications cannot be so blinded. The challenges of how to deal with human biases and incentives may be the most difficult we must face.” Supporting this view is McShane and Gal’s (20162017) empirical demonstration of cognitive dichotomization errors among biomedical and social science researchers—and even among statisticians.

Challenges for editors and reviewers are many. Here’s an example: Fricker et al. (2019) observed that when p-values were suspended from the journal Basic and Applied Social Psychology authors tended to overstate conclusions.

With all the challenges, how do we get from here to there, from a “p < 0.05” world to a post “p < 0.05” world?

Matthews (2019) notes that “Any proposal encouraging changes in inferential practice must accept the ubiquity of NHST.…Pragmatism suggests, therefore, that the best hope of achieving a change in practice lies in offering inferential tools that can be used alongside the concepts of NHST, adding value to them while mitigating their most egregious features.”

Benjamin and Berger (2019) propose three practices to help researchers during the transition away from use of statistical significance. “…[O]ur goal is to suggest minimal changes that would require little effort for the scientific community to implement,” they say. “Motivating this goal are our hope that easy (but impactful) changes might be adopted and our worry that more complicated changes could be resisted simply because they are perceived to be too difficult for routine implementation.”

Yet there is also concern that progress will stop after a small step or two. Even some proponents of small steps are clear that those small steps still carry us far short of the destination.

For example, Matthews (2019) says that his proposed methodology “is not a panacea for the inferential ills of the research community.” But that doesn’t make it useless. It may “encourage researchers to move beyond NHST and explore the statistical armamentarium now available to answer the central question of research: what does our study tell us?” he says. It “provides a bridge between the dominant but flawed NHST paradigm and the less familiar but more informative methods of Bayesian estimation.”

Likewise, Benjamin and Berger (2019) observe, “In research communities that are deeply attached to reliance on ‘p < 0.05,’ our recommendations will serve as initial steps away from this attachment. We emphasize that our recommendations are intended merely as initial, temporary steps and that many further steps will need to be taken to reach the ultimate destination: a holistic interpretation of statistical evidence that fully conforms to the principles laid out in the ASA Statement…”

Yet, like the authors of this editorial, not all authors in this special issue support gradual approaches with transitional methods.

Some (e.g., Amrhein, Trafimow, and Greenland 2019; Hurlbert, Levine, and Utts 2019; McShane et al. 2019) prefer to rip off the bandage and abandon use of statistical significance altogether. In short, no more dichotomizing p-values into categories of “significance.” Notably, these authors do not suggest banning the use of p-values, but rather suggest using them descriptively, treating them as continuous, and assessing their weight or import with nuanced thinking, clear language, and full understanding of their properties.

So even when there is agreement on the destination, there is disagreement about what road to take. The questions around reform need consideration and debate. It might turn out that different fields take different roads.

The catalyst for change may well come from those people who fund, use, or depend on scientific research, say Calin-Jageman and Cumming (2019). They believe this change has not yet happened to the desired level because of “the cognitive opacity of the NHST approach: the counter-intuitive p-value (it’s good when it is small), the mysterious null hypothesis (you want it to be false), and the eminently confusable Type I and Type II errors.”

Reviewers of this editorial asked, as some readers of it will, is a p-value threshold ever okay to use? We asked some of the authors of articles in the special issue that question as well. Authors identified four general instances. Some allowed that, while p-value thresholds should not be used for inference, they might still be useful for applications such as industrial quality control, in which a highly automated decision rule is needed and the costs of erroneous decisions can be carefully weighed when specifying the threshold. Other authors suggested that such dichotomized use of p-values was acceptable in model-fitting and variable selection strategies, again as automated tools, this time for sorting through large numbers of potential models or variables. Still others pointed out that p-values with very low thresholds are used in fields such as physics, genomics, and imaging as a filter for massive numbers of tests. The fourth instance can be described as “confirmatory setting[s] where the study design and statistical analysis plan are specified prior to data collection, and then adhered to during and after it” (Tong 2019). Tong argues these are the only proper settings for formal statistical inference. And Wellek (2017) says at present it is essential in these settings. “[B]inary decision making is indispensable in medicine and related fields,” he says. “[A] radical rejection of the classical principles of statistical inference…is of virtually no help as long as no conclusively substantiated alternative can be offered.”

Eliminating the declaration of “statistical significance” based on p < 0.05 or other arbitrary thresholds will be easier in some venues than others. Most journals, if they are willing, could fairly rapidly implement editorial policies to effect these changes. Suggestions for how to do that are in this special issue of The American Statistician. However, regulatory agencies might require longer timelines for making changes. The U.S. Food and Drug Administration (FDA), for example, has long established drug review procedures that involve comparing p-values to significance thresholds for Phase III drug trials. Many factors demand consideration, not the least of which is how to avoid turning every drug decision into a court battle. Goodman (2019) cautions that, even as we seek change, “we must respect the reason why the statistical procedures are there in the first place.” Perhaps the ASA could convene a panel of experts, internal and external to FDA, to provide a workable new paradigm. (See Ruberg et al. 2019, who argue for a Bayesian approach that employs data from other trials as a “prior” for Phase 3 trials.)

Change is needed. Change has been needed for decades. Change has been called for by others for quite a while. So…

## 6 Why Will Change Finally Happen Now?

In 1991, a confluence of weather events created a monster storm that came to be known as “the perfect storm,” entering popular culture through a book (Junger 1997) and a 2000 movie starring George Clooney. Concerns about reproducible science, falling public confidence in science, and the initial impact of the ASA statement in heightening awareness of long-known problems created a perfect storm, in this case, a good storm of motivation to make lasting change. Indeed, such change was the intent of the ASA statement, and we expect this special issue of TAS will inject enough additional energy to the storm to make its impact widely felt.

We are not alone in this view. “60+ years of incisive criticism has not yet dethroned NHST as the dominant approach to inference in many fields of science,” note Calin-Jageman and Cumming (2019). “Momentum, though, seems to finally be on the side of reform.”

Goodman (2019) agrees: “The initial slow speed of progress should not be discouraging; that is how all broad-based social movements move forward and we should be playing the long game. But the ball is rolling downhill, the current generation is inspired and impatient to carry this forward.”

So, let’s do it. Let’s move beyond “statistically significant,” even if upheaval and disruption are inevitable for the time being. It’s worth it. In a world beyond “p < 0.05,” by breaking free from the bonds of statistical significance, statistics in science and policy will become more significant than ever.

## 7 Authors’ Suggestions

The editors of this special TAS issue on statistical inference asked all the contact authors to help us summarize the guidance they provided in their papers by providing us a short list of do’s. We asked them to be specific but concise and to be active—start each with a verb. Here is the complete list of the authors’ responses, ordered as the papers appear in this special issue.

### 7.1 Getting to a Post “p < 0.05” Era

Ioannidis, J., What Have We (Not) Learnt From Millions of Scientific Papers With p-Values?

1. Do not use p-values, unless you have clearly thought about the need to use them and they still seem the best choice.
2. Do not favor “statistically significant” results.
3. Do be highly skeptical about “statistically significant” results at the 0.05 level.

Goodman, S., Why Is Getting Rid of pValues So Hard? Musings on Science and Statistics

1. Partner with science reform movements and reformers within disciplines, journals, funding agencies and regulators to promote and reward reproducible science and diminish the impact of statistical significance on publication, funding and promotion.
2. Speak to and write for the multifarious array of scientific disciplines, showing how statistical uncertainty and reasoning can be conveyed in non-“bright-line” ways both with conventional and alternative approaches. This should be done not just in didactic articles, but also in original or reanalyzed research, to demonstrate that it is publishable.
3. Promote, teach and conduct meta-research within many individual scientific disciplines to demonstrate the adverse effects in each of over-reliance on and misinterpretation of p-values and significance verdicts in individual studies and the benefits of emphasizing estimation and cumulative evidence.
4. Require reporting a quantitative measure of certainty—a “confidence index”—that an observed relationship, or claim, is true. Change analysis goal from achieving significance to appropriately estimating this confidence.
5. Develop and share teaching materials, software, and published case examples to help with all of the do’s above, and to spread progress in one discipline to others.

Hubbard, R., Will the ASA’s Efforts to Improve Statistical Practice be Successful? Some Evidence to the Contrary

This list applies to the ASA and to the professional statistics community more generally.

1. Specify, where/if possible, those situations in which the p-value plays a clearly valuable role in data analysis and interpretation.
2. Contemplate issuing a statement abandoning the use of p-values in null hypothesis significance testing.

#### Kmetz, J., Correcting Corrupt Research: Recommendations for the Profession to Stop Misuse of p–Values

1. Refer to the ASA statement on p-values whenever submitting a paper or revision to any editor, peer reviewer, or prospective reader. Many in the field do not know of this statement, and having the support of a prestigious organization when authoring any research document will help stop corrupt research from becoming even more dominant than it is.
2. Train graduate students and future researchers by having them reanalyze published studies and post their findings to appropriate websites or weblogs. This practice will benefit not only the students, but will benefit the professions, by increasing the amount of replicated (or nonreplicated) research available and readily accessible, and as well as reformer organizations that support replication.
3. Join one or more of the reformer organizations formed or forming in many research fields, and support and publicize their efforts to improve the quality of research practices.
4. Challenge editors and reviewers when they assert that incorrect practices and interpretations of research, consistent with existing null hypothesis significance testing and beliefs regarding p-values, should be followed in papers submitted to their journals. Point out that new submissions have been prepared to be consistent with the ASA statement on p-values.
5. Promote emphasis on research quality rather than research quantity in universities and other institutions where professional advancement depends heavily on research “productivity,” by following the practices recommended in this special journal edition. This recommendation will fall most heavily on those who have already achieved success in their fields, perhaps by following an approach quite different from that which led to their success; whatever the merits of that approach may have been, one objectionable outcome of it has been the production of voluminous corrupt research and creation of an environment that promotes and protects it. We must do better.

Hubbard, D., and Carriquiry, A., Quality Control for Scientific Research: Addressing Reproducibility, Responsiveness and Relevance

1. Compute and prominently display the probability the hypothesis is true (or a probability distribution of an effect size) or provide sufficient information for future researchers and policy makers to compute it.
2. Promote publicly displayed quality control metrics within your field—in particular, support tracking of reproduction studies and computing the “level 1” and even “level 2” priors as required for #1 above.
3. Promote a byline status for researchers who reproduce studies: Digital versions are dynamically updated to display “Reproduced by….” below original research authors’ names or “Not yet reproduced” until it is reproduced.

Brownstein, N., Louis, T., O’Hagan, A., and Pendergast, J., The Role of Expert Judgment in Statistical Inference and Evidence-Based Decision-Making

1. Staff the study team with members who have the necessary knowledge, skills and experience—statistically, scientifically, and otherwise.
2. Include key members of the research team, including statisticians, in all scientific and administrative meetings.
3. Understand that subjective judgments are needed in all stages of a study.
4. Make all judgments as carefully and rigorously as possible and document each decision and rationale for transparency and reproducibility.
5. Use protocol-guided elicitation of judgments.
6. Statisticians specifically should:
• Refine oral and written communication skills.
• Understand their multiple roles and obligations as collaborators.
• Take an active leadership role as a member of the scientific team; contribute throughout all phases of the study.
• Co-own the subject matter—understand a sufficient amount about the relevant science/policy to meld statistical and subject-area expertise.
• Promote the expectation that your collaborators co-own statistical issues.
• Write a statistical analysis plan for all analyses and track any changes to that plan over time.
• Promote co-responsibility for data quality, security, and documentation.
• Reduce unplanned and uncontrolled modeling/testing (HARK-ing, p-hacking); document all analyses.

O’Hagan, A., Expert Knowledge Elicitation: Subjective but Scientific

1. Elicit expert knowledge when data relating to a parameter of interest is weak, ambiguous or indirect.
2. Use a well-designed protocol, such as SHELF, to ensure expert knowledge is elicited in as scientific and unbiased a way as possible.

Kennedy-Shaffer, L., Before p < 0.05 to Beyond p < 0.05: Using History to Contextualize p-Values and Significance Testing

1. Ensure that inference methods match intuitive understandings of statistical reasoning.
2. Reduce the computational burden for nonstatisticians using statistical methods.
3. Consider changing conditions of statistical and scientific inference in developing statistical methods.
4. Address uncertainty quantitatively and in ways that reward increased precision.

Hubbard, R., Haig, B. D., and Parsa, R. A., The Limited Role of Formal Statistical Inference in Scientific Inference

1. Teach readers that although deemed equivalent in the social, management, and biomedical sciences, formal methods of statistical inference and scientific inference are very different animals.
2. Show these readers that formal methods of statistical inference play only a restricted role in scientific inference.
3. Instruct researchers to pursue significant sameness (i.e., replicable and empirically generalizable results) rather than significant differences in results.
4. Demonstrate how the pursuit of significant differences actively impedes cumulative knowledge development.

McShane, B., Tackett, J., Böckenholt, U., and Gelman, A., Large Scale Replication Projects in Contemporary Psychological Research

1. When planning a replication study of a given psychological phenomenon, bear in mind that replication is complicated in psychological research because studies can never be direct or exact replications of one another, and thus heterogeneity—effect sizes that vary from one study of the phenomenon to the next—cannot be avoided.
2. Future large scale replication projects should follow the “one phenomenon, many studies” approach of the Many Labs project and Registered Replication Reports rather than the “many phenomena, one study” approach of the Open Science Collaboration project. In doing so, they should systematically vary method factors across the laboratories involved in the project.
3. Researchers analyzing the data resulting from large scale replication projects should do so via a hierarchical (or multilevel) model fit to the totality of the individual-level observations. In doing so, all theoretical moderators should be modeled via covariates while all other potential moderators—that is, method factors—should induce variation (i.e., heterogeneity).
4. Assessments of replicability should not depend solely on estimates of effects, or worse, significance tests based on them. Heterogeneity must also be an important consideration in assessing replicability.

### 7.2 Interpreting and Using p

Greenland, S., Valid p-Values Behave Exactly as They Should: Some Misleading Criticisms of p-Values and Their Resolution With s-Values

1. Replace any statements about statistical significance of a result with the p-value from the test, and present the p-value as an equality, not an inequality. For example, if p = 0.03 then “…was statistically significant” would be replaced by “…had p = 0.03,” and “p < 0.05” would be replaced by “p = 0.03.” (An exception: If p is so small that the accuracy becomes very poor then an inequality reflecting that limit is appropriate; e.g., depending on the sample size, p-values from normal or χ2 approximations to discrete data often lack even 1-digit accuracy when p < 0.0001.) In parallel, if p = 0.25 then “…was not statistically significant” would be replaced by “…had p = 0.25,” and “p > 0.05” would be replaced by “p = 0.25.”
2. Present p-values for more than one possibility when testing a targeted parameter. For example, if you discuss the p-value from a test of a null hypothesis, also discuss alongside this null p-value another p-value for a plausible alternative parameter possibility (ideally the one used to calculate power in the study proposal). As another example: if you do an equivalence test, present the p-values for both the lower and upper bounds of the equivalence interval (which are used for equivalence tests based on two one-sided tests).
3. Show confidence intervals for targeted study parameters, but also supplement them with p-values for testing relevant hypotheses (e.g., the p-values for both the null and the alternative hypotheses used for the study design or proposal, as in #2). Confidence intervals only show clearly what is in or out of the interval (i.e., a 95% interval only shows clearly what has p > 0.05 or p ≤ 0.05), but more detail is often desirable for key hypotheses under contention.
4. Compare groups and studies directly by showing p-values and interval estimates for their differences, not by comparing p-values or interval estimates from the two groups or studies. For example, seeing p = 0.03 in males and p = 0.12 in females does not mean that different associations were seen in males and females; instead, one needs a p-value and confidence interval for the difference in the sex-specific associations to examine the between-sex difference. Similarly, if an early study reported a confidence interval which excluded the null and then a subsequent study reported a confidence interval which included the null, that does not mean the studies gave conflicting results or that the second study failed to replicate the first study; instead, one needs a p-value and confidence interval for the difference in the study-specific associations to examine the between-study difference. In all cases, differences-between-differences must be analyzed directly by statistics for that purpose.
5. Supplement a focal p-value p with its Shannon information transform (s-value or surprisal) s = –log2(p). This measures the amount of information supplied by the test against the tested hypothesis (or model): Rounded off, the s-value s shows the number of heads in a row one would need to see when tossing a coin to get the same amount of information against the tosses being “fair” (independent with “heads” probability of 1/2) instead of being loaded for heads. For example, if p = 0.03, this represents –log2(0.03) = 5 bits of information against the hypothesis (like getting 5 heads in a trial of “fairness” with 5 coin tosses); and if p = 0.25, this represents only –log2(0.25) = 2 bits of information against the hypothesis (like getting 2 heads in a trial of “fairness” with only 2 coin tosses).

Betensky, R., The pValue Requires Context, Not a Threshold

1. Interpret the p-value in light of its context of sample size and meaningful effect size.
2. Incorporate the sample size and meaningful effect size into a decision to reject the null hypothesis.

Anderson, A., Assessing Statistical Results: Magnitude, Precision and Model Uncertainty

1. Evaluate the importance of statistical results based on their practical implications.
2. Evaluate the strength of empirical evidence based on the precision of the estimates and the plausibility of the modeling choices.
3. Seek out subject matter expertise when evaluating the importance and the strength of empirical evidence.

Heck, P., and Krueger, J., Putting the p-Value in Its Place

1. Use the p-value as a heuristic, that is, as the base for a tentative inference regarding the presence or absence of evidence against the tested hypothesis.
2. Supplement the p-value with other, conceptually distinct methods and practices, such as effect size estimates, likelihood ratios, or graphical representations.
3. Strive to embed statistical hypothesis testing within strong a priori theory and a context of relevant prior empirical evidence.

Johnson, V., Evidence From Marginally Significant t-Statistics

1. Be transparent in the number of outcome variables that were analyzed.
2. Report the number (and values) of all test statistics that were calculated.
3. Provide access to protocols for studies involving human or animal subjects.
4. Clearly describe data values that were excluded from analysis and the justification for doing so.
5. Provide sufficient details on experimental design so that other researchers can replicate the experiment.
6. Describe only p-values less than 0.005 as being “statistically significant.”

Fraser, D., The pValue Function and Statistical Inference

1. Determine a primary variable for assessing the hypothesis at issue.
2. Calculate its well defined distribution function, respecting continuity.
3. Substitute the observed data value to obtain the “p-value function.”
4. Extract the available well defined confidence bounds, confidence intervals, and median estimate.
5. Know that you don’t have an intellectual basis for decisions.

Rougier, J., pValues, Bayes Factors, and Sufficiency

1. Recognize that behind every choice of null distribution and test statistic, there lurks a plausible family of alternative hypotheses, which can provide more insight into the null distribution.

Rose, S., and McGuire, T., Limitations of pValues and R-Squared for Stepwise Regression Building: A Fairness Demonstration in Health Policy Risk Adjustment

1. Formulate a clear objective for variable inclusion in regression procedures.
2. Assess all relevant evaluation metrics.
3. Incorporate algorithmic fairness considerations.

### 7.3 Supplementing or Replacing p

Blume, J., Greevy, R., Welty, V., Smith, J., and DuPont, W., An Introduction to Second Generation p-Values

1. Construct a composite null hypothesis by specifying the range of effects that are not scientifically meaningful (do this before looking at the data). Why: Eliminating the conflict between scientific significance and statistical significance has numerous statistical and scientific benefits.
2. Replace classical p-values with second-generation p-values (SGPV). Why: SGPVs accommodate composite null hypotheses and encourage the proper communication of findings.
3. Interpret the SGPV as a high-level summary of what the data say. Why: Science needs a simple indicator of when the data support only meaningful effects (SGPV = 0), when the data support only trivially null effects (SGPV = 1), or when the data are inconclusive (0 < SGPV < 1).
4. Report an interval estimate of effect size (confidence interval, support interval, or credible interval) and note its proximity to the composite null hypothesis. Why: This is a more detailed description of study findings.
5. Consider reporting false discovery rates with SGPVs of 0 or 1. Why: FDRs gauge the chance that an inference is incorrect under assumptions about the data generating process and prior knowledge.

Goodman, W., Spruill, S., and Komaroff, E., A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical Evidence Supporting Its Use

1. Determine how far the true parameter’s value would have to be, in your research context, from exactly equaling the conventional, point null hypothesis to consider that the distance is meaningfully large or practically significant.
2. Combine the conventional p-value criterion with a minimum effect size criterion to generate a two-criteria inference-indicator signal, which provides heuristic, but nondefinitive evidence, for inferring the parameter’s true location.
3. Document the intended criteria for your inference procedures, such as a p-value cut-point and a minimum practically significant effect size, prior to undertaking the procedure.
4. Ensure that you use the appropriate inference method for the data that are obtainable and for the inference that is intended.
5. Acknowledge that every study is fraught with limitations from unknowns regarding true data distributions and other conditions that one’s method assumes.

Benjamin, D., and Berger, J., Three Recommendations for Improving the Use of p-Values

1. Replace the 0.05 “statistical significance” threshold for claims of novel discoveries with a 0.005 threshold and refer to p-values between 0.05 and 0.005 as “suggestive.”
2. Report the data-based odds of the alternative hypothesis to the null hypothesis. If the data-based odds cannot be calculated, then use the p-value to report an upper bound on the data-based odds: 1/(-ep ln p).
3. Report your prior odds and posterior odds (prior odds * data-based odds) of the alternative hypothesis to the null hypothesis. If the data-based odds cannot be calculated, then use your prior odds and the p-value to report an upper bound on your posterior odds: (prior odds) * (1/(-ep ln p)).

Colquhoun, D., The False Positive Risk: A Proposal Concerning What to Do About pValues

1. Continue to provide p-values and confidence intervals. Although widely misinterpreted, people know how to calculate them and they aren’t entirely useless. Just don’t ever use the terms “statistically significant” or “nonsignificant.”
2. Provide in addition an indication of false positive risk (FPR). This is the probability that the claim of a real effect on the basis of the p-value is in fact false. The FPR (not the p-value) is the probability that your result occurred by chance. For example, the fact that, under plausible assumptions, observation of a p-value close to 0.05 corresponds to an FPR of at least 0.2–0.3 shows clearly the weakness of the conventional criterion for “statistical significance.”
3. Alternatively, specify the prior probability of there being a real effect that one would need to be able to justify in order to achieve an FPR of, say, 0.05.

## Notes:

There are many ways to calculate the FPR. One, based on a point null and simple alternative can be calculated with the web calculator at http://fpr-calc.ucl.ac.uk/. However other approaches to the calculation of FPR, based on different assumptions, give results that are similar (Table 1 in Colquhoun 2019).

To calculate FPR it is necessary to specify a prior probability and this is rarely known. My recommendation 2 is based on giving the FPR for a prior probability of 0.5. Any higher prior probability of there being a real effect is not justifiable in the absence of hard data. In this sense, the calculated FPR is the minimum that can be expected. More implausible hypotheses would make the problem worse. For example, if the prior probability of there being a real effect were only 0.1, then observation of p = 0.05 would imply a disastrously high FPR = 0.76, and in order to achieve an FPR of 0.05, you’d need to observe p = 0.00045. Others (especially Goodman) have advocated giving likelihood ratios (LRs) in place of p-values. The FPR for a prior of 0.5 is simply 1/(1 + LR), so to give the FPR for a prior of 0.5 is simply a more-easily-comprehensible way of specifying the LR, and so should be acceptable to frequentists and Bayesians.

Matthews, R., Moving Toward the Post p < 0.05 Era via the Analysis of Credibility

1. Report the outcome of studies as effect sizes summarized by confidence intervals (CIs) along with their point estimates.
2. Make full use of the point estimate and width and location of the CI relative to the null effect line when interpreting findings. The point estimate is generally the effect size best supported by the study, irrespective of its statistical significance/nonsignificance. Similarly, tight CIs located far from the null effect line generally represent more compelling evidence for a nonzero effect than wide CIs lying close to that line.
3. Use the analysis of credibility (AnCred) to assess quantitatively the credibility of inferences based on the CI. AnCred determines the level of prior evidence needed for a new finding to provide credible evidence for a nonzero effect.
4. Establish whether this required level of prior evidence is supported by current knowledge and insight. If it is, the new result provides credible evidence for a nonzero effect, irrespective of its statistical significance/nonsignificance.

Gannon, M., Pereira, C., and Polpo, A., Blending Bayesian and Classical Tools to Define Optimal Sample-Size-Dependent Significance Levels

1. Retain the useful concept of statistical significance and the same operational procedures as currently used for hypothesis tests, whether frequentist (Neyman–Pearson p-value tests) or Bayesian (Bayes-factor tests).
2. Use tests with a sample-size-dependent significance level—ours is optimal in the sense of the generalized Neyman–Pearson lemma.
3. Use a testing scheme that allows tests of any kind of hypothesis, without restrictions on the dimensionalities of the parameter space or the hypothesis. Note that this should include “sharp” hypotheses, which correspond to subsets of lower dimensionality than the full parameter space.
4. Use hypothesis tests that are compatible with the likelihood principle (LP). They can be easier to interpret consistently than tests that are not LP-compliant.
5. Use numerical methods to handle hypothesis-testing problems with high-dimensional sample spaces or parameter spaces.

Pogrow, S., How Effect Size (Practical Significance) Misleads Clinical Practice: The Case for Switching to Practical Benefit to Assess Applied Research Findings

1. Switch from reliance on statistical or practical significance to the more stringent statistical criterion of practical benefit for (a) assessing whether applied research findings indicate that an intervention is effective and should be adopted and scaled—particularly in complex organizations such as schools and hospitals and (b) determining whether relationships are sufficiently strong and explanatory to be used as a basis for setting policy or practice recommendations. Practical benefit increases the likelihood that observed benefits will replicate in subsequent research and in clinical practice by avoiding the problems associated with relying on small effect sizes.
2. Reform statistics courses in applied disciplines to include the principles of practical benefit, and have students review influential applied research articles in the discipline to determine which findings demonstrate practical benefit.
3. Recognize the need to develop different inferential statistical criteria for assessing the importance of applied research findings as compared to assessing basic research findings.
4. Consider consistent, noticeable improvements across contexts using the quick prototyping methods of improvement science as a preferable methodology for identifying effective practices rather than on relying on RCT methods.
5. Require that applied research reveal the actual unadjusted means/medians of results for all groups and subgroups, and that review panels take such data into account—as opposed to only reporting relative differences between adjusted means/medians. This will help preliminarily identify whether there appear to be clear benefits for an intervention.

### 7.4 Adopting More Holistic Approaches

McShane, B., Gal, D., Gelman, A., Robert, C., and Tackett, J., Abandon Statistical Significance

1. Treat p-values (and other purely statistical measures like confidence intervals and Bayes factors) continuously rather than in a dichotomous or thresholded manner. In doing so, bear in mind that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures because they are, among other things, typically defined relative to the generally uninteresting and implausible null hypothesis of zero effect and zero systematic error.
2. Give consideration to related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. Do this always—not just once some p-value or other statistical threshold has been attained—and do this without giving priority to p-values or other purely statistical measures.
3. Analyze and report all of the data and relevant results rather than focusing on single comparisons that attain some p-value or other statistical threshold.
4. Conduct a decision analysis: p-value and other statistical threshold-based rules implicitly express a particular tradeoff between Type I and Type II error, but in reality this tradeoff should depend on the costs, benefits, and probabilities of all outcomes.
5. Accept uncertainty and embrace variation in effects: we can learn much (indeed, more) about the world by forsaking the false promise of certainty offered by dichotomous declarations of truth or falsity—binary statements about there being “an effect” or “no effect”—based on some p-value or other statistical threshold being attained.
6. Obtain more precise individual-level measurements, use within-person or longitudinal designs more often, and give increased consideration to models that use informative priors, that feature varying treatment effects, and that are multilevel or meta-analytic in nature.

Tong, C., Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science

1. Prioritize effort for sound data production: the planning, design, and execution of the study.
2. Build scientific arguments with many sets of data and multiple lines of evidence.
3. Recognize the difference between exploratory and confirmatory objectives and use distinct statistical strategies for each.
4. Use flexible descriptive methodology, including disciplined data exploration, enlightened data display, and regularized, robust, and nonparametric models, for exploratory research.
5. Restrict statistical inferences to confirmatory analyses for which the study design and statistical analysis plan are pre-specified prior to, and strictly adhered to during, data acquisition.

Amrhein, V., Trafimow, D., and Greenland, S., Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis If We Don’t Expect Replication

1. Do not dichotomize, but embrace variation.

(a)Report and interpret inferential statistics like the p-value in a continuous fashion; do not use the word “significant.”

(b)Interpret interval estimates as “compatibility intervals,” showing effect sizes most compatible with the data, under the model used to compute the interval; do not focus on whether such intervals include or exclude zero.

(c)Treat inferential statistics as highly unstable local descriptions of relations between models and the obtained data.

(i)Free your “negative results” by allowing them to be potentially positive. Most studies with large p-values or interval estimates that include the null should be considered “positive,” in the sense that they usually leave open the possibility of important effects (e.g., the effect sizes within the interval estimates).

(ii)Free your “positive results” by allowing them to be different. Most studies with small p-values or interval estimates that are not near the null should be considered provisional, because in replication studies the p-values could be large and the interval estimates could show very different effect sizes.

(iii)There is no replication crisis if we don’t expect replication. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems such as failure to publish results in conflict with group expectations.

Calin-Jageman, R., and Cumming, G., The New Statistics for Better Science: Ask How Much, How Uncertain, and What Else Is Known

2. Countenance uncertainty in all statistical conclusions, seeking ways to quantify, visualize, and interpret the potential for error.
3. Seek replication, and use quantitative methods to synthesize across data sets as a matter of course.
4. Use Open Science practices to enhance the trustworthiness of research results.
5. Avoid, wherever possible, any use of p-values or NHST.

Ziliak, S., How Large Are Your G-Values? Try Gosset’s Guinnessometrics When a Little “p” Is Not Enough

• G-10 Consider the Purpose of the Inquiry, and Compare with Best Practice. Falsification of a null hypothesis is not the main purpose of the experiment or observational study. Making money or beer or medicine—ideally more and better than the competition and best practice—is. Estimating the importance of your coefficient relative to results reported by others, is. To repeat, as the 2016 ASA Statement makes clear, merely falsifying a null hypothesis with a qualitative yes/no, exists/does not exist, significant/not significant answer, is not itself significant science, and should be eschewed.
• G-9 Estimate the Stakes (Or Eat Them). Estimation of magnitudes of effects, and demonstrations of their substantive meaning, should be the center of most inquiries. Failure to specify the stakes of a hypothesis is the first step toward eating them (gulp).
• G-8 Study Correlated Data: ABBA, Take a Chance on Me. Most regression models assume “iid” error terms—independently and identically distributed—yet most data in the social and life sciences are correlated by systematic, nonrandom effects—and are thus not independent. Gosset solved the problem of correlated soil plots with the “ABBA” layout, maximizing the correlation of paired differences between the As and Bs with a perfectly balanced chiasmic arrangement.
• G-7 Minimize “Real Error” with the 3 R’s: Represent, Replicate, Reproduce. A test of significance on a single set of data is nearly valueless. Fisher’s p, Student’s t, and other tests should only be used when there is actual repetition of the experiment. “One and done” is scientism, not scientific. Random error is not equal to real error, and is usually smaller and less important than the sum of nonrandom errors. Measurement error, confounding, specification error, and bias of the auspices are frequently larger in all the testing sciences, agronomy to medicine. Guinnessometrics minimizes real error by repeating trials on stratified and balanced yet independent experimental units, controlling as much as possible for local fixed effects.
• G-6 Economize with “Less is More”: Small Samples of Independent Experiments. Small sample analysis and distribution theory has an economic origin and foundation: changing inputs to the beer on the large scale (for Guinness, enormous global scale) is risky, with more than money at stake. But smaller samples, as Gosset showed in decades of barley and hops experimentation, does not mean “less than,” and Big Data is in any case not the solution for many problems.
• G-5 Keep Your Eyes on the Size Matters/How Much? Question. There will be distractions but the expected loss and profit functions rule, or should. Are regression coefficients or differences between means large or small? Compared to what? How do you know?
• G-4 Visualize. Parameter uncertainty is not the same thing as model uncertainty. Does the result hit you between the eyes? Does the study show magnitudes of effects across the entire distribution? Advances in visualization software continue to outstrip advances in statistical modeling, making more visualization a no brainer.
• G-3 Consider Posteriors and Priors too (“It pays to go Bayes”). The sample on hand is rarely the only thing that is “known.” Subject matter expertise is an important prior input to statistical design and affects analysis of “posterior” results. For example, Gosset at Guinness was wise to keep quality assurance metrics and bottom line profit at the center of his inquiry. How does prior information fit into the story and evidence? Advances in Bayesian computing software make it easier and easier to do a Bayesian analysis, merging prior and posterior information, values, and knowledge.
• G-2 Cooperate Up, Down, and Across (Networks and Value Chains). For example, where would brewers be today without the continued cooperation of farmers? Perhaps back on the farm and not at the brewery making beer. Statistical science is social, and cooperation helps. Guinness financed a large share of modern statistical theory, and not only by supporting Gosset and other brewers with academic sabbaticals (Ziliak and McCloskey 2008).
• G-1 Answer the Brewer’s Original Question (“How should you set the odds?”). No bright-line rule of statistical significance can answer the brewer’s question. As Gosset said way back in 1904, how you set the odds depends on “the importance of the issues at stake” (e.g., the expected benefit and cost) together with the cost of obtaining new material.

Billheimer, D., Predictive Inference and Scientific Reproducibility

1. Predict observable events or quantities that you care about.
2. Quantify the uncertainty of your predictions.

Manski, C., Treatment Choice With Trial Data: Statistical Decision Theory Should Supplant Hypothesis Testing

1. Statisticians should relearn statistical decision theory, which received considerable attention in the middle of the twentieth century but was largely forgotten by the century’s end.
2. Statistical decision theory should supplant hypothesis testing when statisticians study treatment choice with trial data.
3. Statisticians should use statistical decision theory when analyzing decision making with sample data more generally.

Manski, C., and Tetenov, A., Trial Size for Near Optimal Choice between Surveillance and Aggressive Treatment: Reconsidering MSLT-II

1. Statisticians should relearn statistical decision theory, which received considerable attention in the middle of the twentieth century but was largely forgotten by the century’s end.
2. Statistical decision theory should supplant hypothesis testing when statisticians study treatment choice with trial data.
3. Statisticians should use statistical decision theory when analyzing decision making with sample data more generally.

Lavine, M., Frequentist, Bayes, or Other?

1. Look for and present results from many models that fit the data well.
2. Evaluate models, not just procedures.

Ruberg, S., Harrell, F., Gamalo-Siebers, M., LaVange, L., Lee J., Price K., and Peck C., Inference and Decision-Making for 21st Century Drug Development and Approval

1. Apply Bayesian paradigm as a framework for improving statistical inference and regulatory decision making by using probability assertions about the magnitude of a treatment effect.
2. Incorporate prior data and available information formally into the analysis of the confirmatory trials.
3. Justify and pre-specify how priors are derived and perform sensitivity analysis for a better understanding of the impact of the choice of prior distribution.
4. Employ quantitative utility functions to reflect key considerations from all stakeholders for optimal decisions via a probability-based evaluation of the treatment effects.
5. Intensify training in Bayesian approaches, particularly for decision makers and clinical trialists (e.g., physician scientists in FDA, industry and academia).

van Dongen, N., Wagenmakers, E.J., van Doorn, J., Gronau, Q., van Ravenzwaaij, D., Hoekstra, R., Haucke, M., Lakens, D., Hennig, C., Morey, R., Homer, S., Gelman, A., and Sprenger, J., Multiple Perspectives on Inference for Two Simple Statistical Scenarios

1. Clarify your statistical goals explicitly and unambiguously.
2. Consider the question of interest and choose a statistical approach accordingly.
3. Acknowledge the uncertainty in your statistical conclusions.
4. Explore the robustness of your conclusions by executing several different analyses.
5. Provide enough background information such that other researchers can interpret your results and possibly execute meaningful alternative analyses.

### 7.5 Reforming Institutions: Changing Publication Policies and Statistical Education

Trafimow, D., Five Nonobvious Changes in Editorial Practice for Editors and Reviewers to Consider When Evaluating Submissions in a Post P < 0.05 Universe

1. Tolerate ambiguity.
2. Replace significance testing with a priori thinking.
3. Consider the nature of the contribution, on multiple levels.
4. Emphasize thinking and execution, not results.
5. Consider that the assumption of random and independent sampling might be wrong.

Locascio, J., The Impact of Results Blind Science Publishing on Statistical Consultation and Collaboration

For journal reviewers

1. Provide an initial provisional decision regarding acceptance for publication of a journal manuscript based exclusively on the judged importance of the research issues addressed by the study and the soundness of the reported methodology. (The latter would include appropriateness of data analysis methods.) Give no weight to the reported results of the study per se in the decision as to whether to publish or not.
2. To ensure #1 above is accomplished, commit to an initial decision regarding publication after having been provided with only the Introduction and Methods sections of a manuscript by the editor, not having seen the Abstract, Results, or Discussion. (The latter would be reviewed only if and after a generally irrevocable decision to publish has already been made.)

For investigators/manuscript authors

1. Obtain consultation and collaboration from statistical consultant(s) and research methodologist(s) early in the development and conduct of a research study.
2. Emphasize the clinical and scientific importance of a study in the Introduction section of a manuscript, and give a clear, explicit statement of the research questions being addressed and any hypotheses to be tested.
3. Include a detailed statistical analysis subsection in the Methods section, which would contain, among other things, a justification of the adequacy of the sample size and the reasons various statistical methods were employed. For example, if null hypothesis significance testing and p-values are used, presumably supplemental to other methods, justify why those methods apply and will provide useful additional information in this particular study.
4. Submit for publication reports of well-conducted studies on important research issues regardless of findings, for example, even if only null effects were obtained, hypotheses were not confirmed, mere replication of previous results were found, or results were inconsistent with established theories.

Hurlbert, S., Levine, R., and Utts, J., Coup de Grâce for a Tough Old Bull: “Statistically Significant” Expires

1. Encourage journal editorial boards to disallow use of the phrase “statistically significant,” or even “significant,” in manuscripts they will accept for review.
2. Give primary emphasis in abstracts to the magnitudes of those effects most conclusively demonstrated and of greatest import to the subject matter.
3. Report precise p-values or other indices of evidence against null hypotheses as continuous variables not requiring any labeling.
4. Understand the meaning of and rationale for neoFisherian significance assessment (NFSA).

Campbell, H., and Gustafson, P., The World of Research Has Gone Berserk: Modeling the Consequences of Requiring “Greater Statistical Stringency” for Scientific Publication

1. Consider the meta-research implications of implementing new publication/funding policies. Journal editors and research funders should attempt to model the impact of proposed policy changes before any implementation. In this way, we can anticipate the policy impacts (both positive and negative) on the types of studies researchers pursue and the types of scientific articles that ultimately end up published in the literature.

Fricker, R., Burke, K., Han, X., and Woodall, W., Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban

1. Use measures of statistical significance combined with measures of practical significance, such as confidence intervals on effect sizes, in assessing research results.
2. Classify research results as either exploratory or confirmatory and appropriately describe them as such in all published documentation.
3. Define precisely the population of interest in research studies and carefully assess whether the data being analyzed are representative of the population.
4. Understand the limitations of inferential methods applied to observational, convenience, or other nonprobabilistically sampled data.

Maurer, K., Hudiburgh, L., Werwinski, L., and Bailer J., Content Audit for p-Value Principles in Introductory Statistics

1. Evaluate the coverage of p-value principles in the introductory statistics course using rubrics or other systematic assessment guidelines.
2. Discuss and deploy improvements to curriculum coverage of p-value principles.
3. Meet with representatives from other departments, who have majors taking your statistics courses, to make sure that inference is being taught in a way that fits the needs of their disciplines.
4. Ensure that the correct interpretation of p-value principles is a point of emphasis for all faculty members and embedded within all courses of instruction.

Steel, A., Liermann, M., and Guttorp, P., Beyond Calculations: A Course in Statistical Thinking

1. Design curricula to teach students how statistical analyses are embedded within a larger science life-cycle, including steps such as project formulation, exploratory graphing, peer review, and communication beyond scientists.
2. Teach the p-value as only one aspect of a complete data analysis.
3. Prioritize helping students build a strong understanding of what testing and estimation can tell you over teaching statistical procedures.
4. Explicitly teach statistical communication. Effective communication requires that students clearly formulate the benefits and limitations of statistical results.
5. Force students to struggle with poorly defined questions and real, messy data in statistics classes.
6. Encourage students to match the mathematical metric (or data summary) to the scientific question. Teaching students to create customized statistical tests for custom metrics allows statistics to move beyond the mean and pinpoint specific scientific questions.

Gratefully,
Ronald L. Wasserstein
American Statistical Association, Alexandria, VA
ron@amstat.org
Allen L. Schirm
Mathematica Policy Research (retired), Washington, DC
allenschirm@gmail.com
Nicole A. Lazar
Department of Statistics, University of Georgia, Athens, GA
nlazar@stat.uga.edu

P value functions: An underused method to present research results and to promote quantitative reasoningSource: Wiley
Age, sex and period estimates of Australia’s mental health over the last 17 yearsSource: SAGE Publications
Best uses of p -values and complementary measures in medical research: Recent developments in the frequentist and Bayesian frameworksSource: Informa UK Limited
Failing Grade: 89% of Introduction-to-Psychology Textbooks That Define or Explain Statistical Significance Do So IncorrectlySource: SAGE Publications
Family Socioeconomic Status and Early Life Mortality Risk in the United StatesSource: Springer Science and Business Media LLC
How to Teach Evidence-Based Practice in Social Work: A Systematic ReviewSource: SAGE Publications
Insights into the Potential of the Atlantic Cod Gut Microbiome as Biomarker of Oil Contamination in the Marine EnvironmentSource: MDPI AG
Is One Study as Good as Three? College Graduates Seem to Think So, Even if They Took Statistics ClassesSource: SAGE Publications
Peritoneal mesothelioma and asbestos exposure: a population-based case–control study in Lombardy, ItalySource: BMJ
The influence of extrinsic and intrinsic variables on children’s reading frequency and attitudes: An exploration using an artificial neural networkSource: SAGE Publications
The p value wars (again)Source: Springer Science and Business Media LLC
Toward Replicability With Confidence Intervals for the Exceedance ProbabilitySource: Informa UK Limited
Wild pig (Sus scrofa L.) occupancy patterns in the Brazilian Atlantic forestSource: FapUNIFESP (SciELO)

## Other articles or books referenced

Coup de Grâce for a Tough Old Bull: “Statistically Significant” ExpiresStuart H. Hurlbert et al.The American StatisticianPublished online: 20 Mar 2019Statistical Inference Enables Bad Science; Statistical Thinking Enables Good ScienceChristopher TongThe American StatisticianPublished online: 20 Mar 2019Correcting Corrupt Research: Recommendations for the Profession to Stop Misuse of p-ValuesJohn L. KmetzThe American StatisticianPublished online: 20 Mar 2019View more

### Keep up to date

Registered in England & Wales No. 3099067
5 Howick Place | London | SW1P 1WG

Accept

Informações relevantes relacionadas à leitura de livros e seus aspectos
Parte 1
Dados obtidos pela Internet no dia 24/05/2021
Rodrigo Nunes Cal
Escola Municipal Dr. Cenobelino de Barros Serra
São José do Rio Preto – SP

Maior portal de compra e venda de livros do Brasil: https://www.estantevirtual.com.br/
Sebos e Livreiros em todo o Brasil: https://www.estantevirtual.com.br/garimpepor/sebose-livreiros
Maior projeto de leitura e escrita do Brasil: https://www.estantemagica.com.br/
https://digital.estantemagica.com.br/ https://blog.estantemagica.com.br/
https://literaturalivre.sescsp.org.br/o-projeto
https://literaturalivre.sescsp.org.br/biblioteca
https://mojo.org.br/blog/
https://literaturalivre.sescsp.org.br/
O fascínio que as narrativas exercem sobre nós revela o poder da evocação da palavra.
Nesse sentido, disponibilizar gratuitamente obras da literatura africana, árabe, asiática,
europeia e judaica, traduzidas para o português, é transcender fronteiras de tempo e
espaço, enriquecendo a
experiência cultural decorrente dessa aproximação. Tendo como pauta os movimentos
migratórios para o Brasil, Literatura Livre colabora para difundir o acesso à cultura de
diferentes países e estimular a leitura, reafirmando os propósitos educativos das ações
socioculturais do Sesc. Que o ambiente digital favoreça o alcance de distintos públicos
à diversidade de imaginários distantes geograficamente, que imprimem presença
marcante ou diluída na cultura brasileira. Sesc São Paulo
Projeto Estante Mágica: O maior projeto de leitura e escrita do Brasil transforma
seus alunos em escritores do próprio livro, nas aulas remotas ou presenciais!
Somos apoio para escolas e educadores, mesmo em tempos de pandemia.
Nosso projeto é e sempre será gratuito.
Como funciona o projeto Estante mágica?
A Estante Mágica é uma plataforma que transforma as crianças em apaixonadas
por leitura. O projeto oferece aos alunos de educação infantil e fundamental 1,
ferramentas pedagógicas e incentivo para escreverem histórias que logo vão
parar no papel e se transformam em livros com ilustrações.
Cadastro: O primeiro passo é criar sua conta grátis. Cadastre sua turma, sem
quantidade mínima ou máxima de alunos!
Planejamento: Você encontra guias de aplicação, materiais de apoio para incluir
o projeto na sua rotina de aulas remotas ou presenciais.
Aplicação: Seus alunos soltam a criatividade e criam histórias durante as aulas.
Depois é só enviar os textos e imagens pelo nosso app!
Resultado: Cada história vira um livro! Todos os alunos ganham seu livro digital
grátis e podem celebrar a conquista em um evento de autógrafos.
Benefícios para a escola:
1- Projeto alinhado à BNCC: Oferecemos uma nova maneira de trabalhar a àrea de
Linguagens e as competências socioemocionais de seus alunos.
2- Família e escola mais conectadas: Os responsáveis participam das atividades e
vêem de perto como a escola atua no desenvolvimento infantil.
3 – Diferencial no ensino remoto: Crie memórias especiais nos alunos em um ano
tão difícil de pandemia e celebre em um evento online.

1. É fácil e cabe na rotina: Oferecemos guias, materiais de apoio e um app que
facilita a aplicação no ensino remoto, presencial ou híbrido.
2. Alunos mais motivados: Os alunos se interessam por leitura e escrita e se
engajam durante as aulas, ficando mais motivados a aprender.
3. Professores reconhecidos: Seu nome fica registrado no livro infantil, e seu
Faça tudo pelo nosso aplicativo!
É aqui que a mágica acontece: tudo em um só lugar, de forma prática e rápida.
4. Cadastre sua turma e seus alunos
5. Convide sua equipe para gerenciar as turmas
6. Convide seu gestor para acompanhar o projeto
7. Divulgue e indique para seus amigos
8. Organize melhor seu tempo
9. Envie as obras de seus alunos
10. Veja o livro digital ficar pronto na hora!
A Estante Mágica: projeto incentiva o lado escritor das crianças
Imaginação, comprometimento, talento e amor vão parar no papel e
transformam crianças em autores de livros ilustrados.
A Estante Mágica é uma plataforma que transforma as crianças em
apaixonadas por leitura. O projeto oferece aos alunos de educação infantil
e fundamental 1, ferramentas pedagógicas e incentivo para escreverem
histórias que logo vão parar no papel e se transformam em livros com
escrever e ter suas ideias publicadas!
A Estante Mágica tem parcerias com escolas de todo o Brasil e
disponibiliza acesso completo e gratuito às suas ferramentas de
escolher o plano de aula que quer aplicar em cada turma, e assim é
pequeno autor. Com o apoio dos professores as crianças são
estimuladas a criar dentro das salas de aula. Depois desse processo,
Hoje a plataforma conta com mais de três mil escolas parceiras e já
apoiou mais de 250 mil pequenos escritores a soltarem a imaginação
no papel. Os livros custam de R$39 a R$59 e acontece até sessão de
autógrafos.
Qual é a vantagem dos alunos escreverem suas próprias histórias e
A lista é imensa, mas podemos destacar o desenvolvimento da
criatividade, autoestima, protagonismo e autoconfiança. Sem contar
no orgulho que os estudantes e a família sentem ao ver o livro lançado!
O time que forma o projeto quer ir mais longe: o plano é ter 350 mil autores
ainda em 2018 e até 2030 atingir um bilhão de crianças. Isso sim é
espalhar o talento dos pequenos pelo mundo.
Se encantou com o projeto? Saiba mais sobre a Estante Mágica no link:

## http://www.estantemagica.com.br

https://www.dentrodahistoria.com.br/blog/literatura/livros-paracriancas/100-livros-infantis-gratis/
+100 Livros infantis grátis para ler com as crianças
26 de outubro de 2020
Conheça sites onde é possível acessar mais de 100 livros
infantis grátis para ler com as crianças e estimular o hábito da
leitura desde cedo.
Sabemos que, se estimuladas desde cedo, logo as crianças se tornam
leitoras vorazes. São capazes de emendar um livro no outro pedir sempre
por mais uma história. Para ajudar as famílias na missão de nunca ficar
sem uma história para ler, trazemos aqui uma lista de sites que
disponibilizam livros infantis grátis.
A importância de ler para as crianças
A leitura é um hábito que traz inúmeros benefícios para o
desenvolvimento infantil. Ela contribui para a capacidade de atenção, de
concentração, de memória e de raciocínio. Por meio do contato com
textos literários, a criança amplia o seu vocabulário e passa a
compreender melhor o mundo à sua volta.
Além disso, a leitura estimula a imaginação e a criatividade, que serão
que a leitura é um dos estímulos mais importantes para a Geração Alpha,
que são as crianças nativas-digitais.
Onde acessar livros infantis grátis
Conheça 5 plataformas para acessar livros infantis grátis e
ler para as crianças a qualquer momento:
https://www.dentrodahistoria.com.br/
A plataforma Dentro da História incentiva a leitura transformando as crianças
em personagens de histórias infantis.
No site, é possível criar um personagem com as características da criança e
depois ler o livro online de forma gratuita.
São mais de 100 histórias disponíveis, e no próprio site há indicações
• Bebês até 2 anos
https://www.dentrodahistoria.com.br/livros-bebes
• 3 a 5 anos
https://www.dentrodahistoria.com.br/livros-criancas-3a5
• 6 a 8 anos
https://www.dentrodahistoria.com.br/livros-criancas-6a8
• 9 a 12 anos
https://www.dentrodahistoria.com.br/livros-criancas-9a12
Depois de ler a história online, é possível fazer a compra do livro e recebêlo em casa, impresso com capa dura e com dedicatória também
Veja algumas das histórias mais acessadas:
• Turma da Mônica – Aventura no Limoeiro
• Frozen – Em Clima de Diversão
• Mundo Bita – Deu Fome!
• Batman – Uma Dupla em Ação
• Monteiro Lobato – O Reino das Águas Claras
2 – Eu leio para uma criança
https://www.euleioparaumacrianca.com.br/estante-digital/
A plataforma Eu Leio para uma Criança faz parte da campanha do Itaú
Unibanco e Fundação Itaú Social iniciada 10 anos atrás, que atualmente
disponibiliza livros infantis gratuitos em formato digital para ler no celular
ou no computador. Alguns livros disponíveis são:
• Malala, a menina que queria ir para a escola
• A menina das estrelas
• O menino e o foguete
Exemplo: Da janela de Minas
Este livro é baseado em uma das histórias vencedoras da Olimpíada de Língua
Portuguesa de 2019 na categoria poema. A garota, da cidade de Belo Horizonte,
Minas Gerais, tinha apenas 12 anos quando escreveu a redação.
https://www.euleioparaumacrianca.com.br/historias/da-janela-deminas/ (Dá para ler o livro no site e/ou baixá-lo em PDF)

3 – Domínio Público
braForm.do
A plataforma governamental Domínio Público disponibiliza livros infantis grátis
em PDFs com acesso livre em uma biblioteca digital. O objetivo é promover o
amplo acesso a livros que já estão em domínio público, ou seja, que não
possuem mais direitos autorais e por isso podem ser acessados gratuitamente.
Veja alguns exemplos:
• A Bruxa e o Caldeirão
• No Reino das Letras Felizes
4 – Espaço de leitura
http://espacodeleitura.labedu.org.br/
http://labedu.org.br/plataformas/
O Espaço de Leitura é uma plataforma do projeto LabEdu, que tem por
objetivo oferecer às crianças possibilidades para se desenvolverem com
base em aprendizagens significativas. A plataforma possui diversas
histórias em formato digital para ler de forma gratuita. Também
experientes.
Conheça algumas histórias:
• A Receita da Mandrágora
• Onde está o meu cachorro?
• O Duende Gumercindo
O álbum d
Exemplo: O álbum de Irina: http://espacodeleitura.labedu.org.br/livros/oalbum-de-irina/?leitor=1
Plataformas de Aprendizagem – http://labedu.org.br/plataformas/

1. O que são
2. Potencial de aprendizagem
3. Acesse as plataformas
0 a 10 ANOS
Aprendendo
Aplicativo digital que oferece dicas para os adultos interagirem de forma
produtiva com as crianças no dia a dia, dentro e fora de casa.
simples que podem ser feitas com as crianças em distintos ambientes e
momentos da rotina diária.
http://labedu.org.br/apprendendo/

0 a 5 anos
Aprender Linguagem
Guia visual sobre o desenvolvimento da linguagem que leva o adulto a
compreender a importância das interações que mantém com as crianças.
Um guia completo para famílias e educadores sobre os processos de aquisição
da linguagem pela criança entre 0 e 5 anos de idade.
Dentro do intervalo de 0 a 5 anos, distinguem-se quatro momentos evolutivos
marcados por grandes conquistas no processo de aprendizagem da linguagem:
desde a compreensão inicial e aquisição do primeiro vocabulário até conversão
em falante nativo.
Temas:
http://aprenderlinguagem.org.br/
http://aprenderlinguagem.org.br/temas/interacao/
http://aprenderlinguagem.org.br/temas/vocabulario/
http://aprenderlinguagem.org.br/temas/discurso/
http://aprenderlinguagem.org.br/temas/fonetica-e-fonologia/
http://aprenderlinguagem.org.br/temas/gramatica/
http://aprenderlinguagem.org.br/temas/lingua-escrita/
0 a 18 meses: Interação, Fonética e Fonologia, Vocabulário e
Gramática
18 meses a 3 anos / 3 a 4 anos: Vocabulário, Fonética e Fonologia,
Gramática e Discurso
4 a 5 anos: Vocabulário, Gramática, Discurso e Língua Escrita

6 a 8 ANOS
Espaço de Leitura
Coleção de livros digitais que envolvem a linguagem e os diferentes modos de
http://espacodeleitura.labedu.org.br/
O Espaço de Leitura proporciona diferentes experiências às pessoas
interessadas em desfrutar de ricos momentos de leitura. Conheça as
ferramentas disponíveis e veja como é fácil interagir com a plataforma!
http://espacodeleitura.labedu.org.br/sobre/?session=leitura-dos-contos
http://espacodeleitura.labedu.org.br/livros/
LER COM AS CRIANÇAS
JOGAR
COMO APROFUNDAR?
Histórias que você vai encontrar
A receita de Mandrágora
A lenda de Sigurd
As sete cabritinhas e o lobo
Onde está o meu cachorro?
Dois irmãos
O Duende Gumercindo
O álbum de Irina
CONTEÚDO LINGUÍSTICO: Autobiografia
SINOPSE: A mãe de Irina espera um bebê e há dias toda a família se prepara para
esta chegada. Inclusive Irina, que para entender melhor tamanha
movimentação de objetos e emoções, vê-se diante de sua própria história em
um álbum de recordações organizado por sua mãe!
9 a 10 ANOS
Aprender a Estudar Textos
https://aprenderaestudartextos.org.br/
Metodologia de formação de professores para potencializar a aprendizagem da
linguagem acadêmica como ferramenta de acesso ao conhecimento.
0 a 6 ANOS
Aprender com 7 experiências fundamentais
https://labedu.org.br/7-experiencias/
Metodologia de sensibilização de agentes sociais na causa da Primeira Infância.
Busca oferecer referências para que diferentes instituições e atores, cada um a
seu modo, possa colaborar com o desenvolvimento infantil na comunidade.
1.Falar e ser ouvido(a)
2.Brincar
3.Conviver
4.Contar e comparar
5.Escutar a leitura de textos
6.Explorar e questionar
7.Experimentar e apreciar diferentes formas de arte
É uma plataforma de roteiros de passeios temáticos que usam a cidade de São
Paulo como espaço de aprendizagem. Os roteiros divertidos e educativos
possibilitam olhar a cidade com outros olhos. Passear pode ser uma grande
experiência educativa!
Todos os nossos projetos são resultado de um ciclo permanente de produção de
conteúdos, pesquisa e implementação que gera conhecimento aplicável para
influir nos processos educacionais, dentro e fora da escola. Conheça alguns
números alcançados até dezembro de 2018.

5 – Livros infantis grátis da Amazon
https://www.amazon.com.br/s?bbn=6311441011&rh=n%3A6311441011%2
Cn%3A5559842011&dc&fst=as%3Aoff&qid=1603724639&rnid=631144101
1&ref=lp_6311441011_nr_n_15
Para quem não conhece, a Amazon é uma das cinco maiores empresas
de tecnologia do mundo e se destaca pela venda de livros online. No site,
é possível encontrar uma lista de livros infantis grátis em formato de
Ebooks.
Alguns títulos disponíveis são:
• O Elefante em Apuros
• Hai-kais do Menino Maluquinho
• O Relógio que Perdeu a Hora
Para acessar esses livros digitais, basta criar uma conta na Amazon
e baixar o aplicativo Kindle em qualquer celular ou tablet.
Conheça sites e editoras que oferecem livros gratuitos
para crianças e adolescentes na quarentena
Publicado em:11 de maio de 2020
Os livros podem ser bons aliados para a criançada se entreter durante o isolamento
social que nos foi imposto por conta do coronavírus. E com o impacto econômico, ter
acesso a livros de forma gratuita é melhor ainda. Listamos a seguir sites e editoras
que oferecem livros gratuitos para crianças e adolescentes na quarentena.
Aproveite e reserve um tempo para ler junto com as crianças!
Bookplay Kids
O Bookplay Kids é um aplicativo para crianças de todas as idades com milhares
de livros, audiolivros e jogos educativos. A plataforma usa a tecnologia para
estimular o desenvolvimento intelectual e socioemocional dos pequenos.
Por que ter o Bookplay Kids?
• Ambiente seguro e sem propaganda.
• Aplicativo que diverte as crianças e ensina ao mesmo tempo.
• Conteúdo divido em categorias de fácil navegação.
• Com uma única conta você tem acesso ilimitado e simultâneo, a todo o
conteúdo, para até 3 perfis.
• Classifique os conteúdos para que o Bookplay faça recomendações para você.
https://apps.apple.com/br/app/bookplay-kids/id1492268472
https://bookplay.com.br/kids/
O aplicativo Bookplay Kids, da Bookplay, reúne conteúdos de super-heróis da Marvel
e princesas da Disney. Para os menores, traz coleções especiais de animações como
Peppa Pig e Patrulha Canina. Os audiolivros também têm um espaço dedicado a
atividades para imprimir. A partir do dia 15 de maio, o app oferece teste de trinta
dias grátis. A assinatura custa R$39,90. Disponível para iOS e Android. Auti Books https://www.autibooks.com/ A Auti Books liberou o acesso a 10 audiolivros infantis. São todos clássicos da literatura: O Gato de Botas, Chapeuzinho Vermelho, Pinóquio, Os Três Porquinhos, Ali Babá e os Quarenta Ladrões, O Patinho Feio, A Festa no Céu, João e Maria, O Pequeno Polegar e O Soldadinho de Chumbo. Os títulos podem ser encontrados no site da Auti Books e no aplicativo para Android e IOS. Para baixar os títulos gratuitos, basta escolher o livro e, na hora da finalização, adicionar o cupom vamosajudar. FTD Educação https://conteudoaberto.ftd.com.br/ https://conteudoaberto.ftd.com.br/revista-mundo-escolar/ https://mundoescolaronline.com.br/sumario-edicao-05/ https://conteudoaberto.ftd.com.br/category/recursos-para-as-aulas/educacaoinfantil/ https://conteudoaberto.ftd.com.br/recursos-para-as-aulas/o-aprendiz-de-paje/ https://cache-conteudoaberto.s3.amazonaws.com/filecontent/VIDEOS/RABECA-E-CAVAQUINHO-O-APRENDIZ-DE-PAJE.mp4 https://conteudoaberto.ftd.com.br/category/exclusivo-para-oprofessor/ebooks/ O portal Conteúdo Aberto, da FTD Educação, oferece um acervo bem variado de literatura brasileira. Você encontra livros de Ana Maria Machado, como Festa no Céu e Cachinhos de Ouro, e a coleção “Meu Primeiro Lobato” com clássicos como Narizinho. Há ainda títulos internacionais, como Vinte Mil Léguas Submarinas, de Júlio Verne. Espaço de Leitura O Espaço de Leitura (espacodeleitura.labedu.org.br) é uma plataforma que disponibiliza uma série de recursos para enriquecer a aprendizagem das crianças no que diz respeito à aquisição e ao desenvolvimento da linguagem. É possível, por exemplo, alternar entre diferentes formas de leitura e brincar com jogos que trabalham os conteúdos abordados nas narrativas. Embora a criança possa navegar pelo Espaço de Leitura sozinha, o site fornece orientações para que os pais saibam como aprimorar a experiência de aprendizagem dos filhos. Kidsbooks – Itaú Criança https://www.euleioparaumacrianca.com.br/ Estante Digital Uma série de livros infantis gratuitos que cabe na bolsa, no bolso e até na palma da mão: fica dentro do celular. Assim, dá para ler para uma criança em qualquer momento e em qualquer lugar. https://www.euleioparaumacrianca.com.br/estante-digital/ https://www.euleioparaumacrianca.com.br/video-conheca-a-estante-digital/ https://www.euleioparaumacrianca.com.br/livros-acessiveis/ A plataforma online do banco reúne atualmente 13 livros infantis – todos disponíveis sem necessidade de preencher qualquer cadastro. Muitas das histórias foram escritas pelo time de frente da literatura contemporânea brasileira, com narrativas animadas que podem ser lidas no celular ou no computador. Entre os autores estão Conceição Evaristo, Luis Fernando Verissimo, Antonio Prata, Marcelo Rubens Paiva e Adriana Carranca, por exemplo. • Vários autores • Disponível em euleioparaumacrianca.com.br Quer receber livros gratuitos no seu WhatsApp? Envie uma mensagem e veja como é simples pedir o seu livro digital. Converse com Itaú – Leia para uma criança no WhatsApp https://api.whatsapp.com/send/?phone=5511981511078&text&app_absent =0 Amazon https://www.amazon.com.br/b?ie=UTF8&node=6311441011 https://www.amazon.com.br/s?rh=n%3A6311441011&fs=true&ref=lp_631144 1011_sar https://www.amazon.com.br/s?i=digitaltext&rh=n%3A6311441011&fs=true&page=2&qid=1621875814&ref=sr_pg_2 https://www.amazon.com.br/s?i=digitaltext&rh=n%3A6311441011&fs=true&page=3&qid=1621875841&ref=sr_pg_3 https://www.amazon.com.br/s?i=digitaltext&rh=n%3A6311441011&fs=true&page=9&qid=1621875890&ref=sr_pg_8 https://www.amazon.com.br/s?i=digitaltext&rh=n%3A6311441011&fs=true&page=20&qid=1621875949&ref=sr_pg_2 0 https://www.amazon.com.br/s?i=digitaltext&rh=n%3A6311441011&fs=true&page=35&qid=1621876014&ref=sr_pg_3 5 https://www.amazon.com.br/s?i=digitaltext&rh=n%3A6311441011&fs=true&page=62&qid=1621876125&ref=sr_pg_6 2 https://www.amazon.com.br/s?i=digitaltext&rh=n%3A6311441011&fs=true&page=70&qid=1621876187&ref=sr_pg_7 0 E demais links da Amazon.. eBooks Gratuitos https://www.amazon.com.br/b?ie=UTF8&node=6311441011 A Amazon oferece e-books com diversas opções gratuitas para Kindle. Para encontrá-las, basta entrar na Loja Kindle, selecionar a aba “eBooks Kindle” e, depois, “Infantil”. Por fim, clicar em “100 mais baixados grátis”. • Vários autores • Disponível pelo app e site http://www.amazon.com.br Storyline Online https://storylineonline.net/ O site compila livros infantis, muitos deles clássicos da literatura dos Estados Unidos. São narrados por diferentes nomes, como Oprah Winfrey e Rami Malek, vencedor do Oscar de melhor ator. É possível ver no YouTube, com legendas em português. • Vários autores • Disponível no site storylineonline.net Amal A protagonista desse livro digital foi forçada a exatamente o oposto. Na história, a menina síria Amal acordou certo dia com bombas e precisou deixar sua casa por causa da guerra. A crise dos refugiados, refletida na vida da garota, é toda contada por narradores e lembra uma contação de histórias acompanhada de fundo musical e ilustrações de Renato Moriconi. • Autora: Carolina Montenegro • Ilustrações: Renato Moriconi • Ed. Caixote • Disponível na App Store e no Google Play Nautilus Inspirado no clássico Vinte Mil Léguas Submarinas, de Júlio Verne, o livro digital é indicado para crianças mais velhas e pré-adolescentes. Na aventura, um professor e seu empregado embarcam em uma viagem de navio e acabam dentro do submarino do capitão Nemo. Vencedor do prêmio Jabuti, o aplicativo traz ilustrações animadas e informações sobre Júlio Verne e seus livros. • Autor: Maurício Boff • Ilustrações: Fernando Tangi • Ed. Storymax • Disponível na App Store e no Google Play WizardingWorld.com https://www.wizardingworld.com/collections/starting-harry-potter “Harry Potter e a Pedra Filosofal”, o primeiro volume da série sobre o menino mago, está disponível na internet de graça como livro eletrônico e audiolivro durante o mês de abril. Um novo portal online, Harry Potter At Home, será dedicado a crianças pequenas e será lançado no WizardingWorld.com, o site oficial dos fãs de Harry Potter e da série de filmes derivados da franquia, “Animais Fantásticos”. Todas as semanas, a “Wizarding Wednesdays” e uma newsletter por email oferecerão atividades criativas, questionários e ideias. Domínio Público http://www.dominiopublico.gov.br/pesquisa/ResultadoPesquisaObraForm.do? first=50&skip=0&ds_titulo&co_autor&no_autor&co_categoria=33&pagina=1& select_action=Submit&co_midia=2&co_obra&co_idioma=1&colunaOrdenar=n ull&ordem=null São cerca de 250 livros infantis em PDF com acesso livre e gratuito na biblioteca digital. O principal objetivo da plataforma é promover o amplo acesso às obras literárias, artísticas e científicas já em domínio público ou que tenham a sua divulgação devidamente autorizada. Bamboleio http://site.bamboleio.com.br/ A biblioteca digital de literatura infantil oferece seu acervo gratuitamente por 45 dias. Além de uma curadoria especializada, os livros estão divididos por temas e há material de apoio para o adulto, com dicas de leitura e de atividades para fazer a partir dos livros. Brinque-Book https://www.brinquebook.com.br/ Os títulos são narrados por contadores de histórias e ganham efeitos sonoros. Podem ser acessados por smartphone, tablet, computador ou podcast. Editora Panini https://loja.panini.com.br/ A editora disponibiliza 20 de suas publicações para serem adquiridas digitalmente por meio do Amazon Kindle, Kobo e Google Play. Turma da Mônica https://turmadamonica.uol.com.br/home/ Para quem gosta de quadrinhos, e para ajudar a entreter as crianças também, a editora Mauricio de Sousa Produções liberou um pacote de HQs com 188 gibis para ler gratuitamente. Para acessá-los, baixe o aplicativo Banca da Mônica, disponível para Android e iOS (iPhone e iPad). Os gibis ficam de graça até o dia 25 de abril de 2020. Toca Livros https://www.tocalivros.com/audiobook/audiolivros-gratis O site tem uma opção de assinatura paga, mas também disponibiliza audiobooks grátis, como: O Pequeno Príncipe, Macunaíma, Peter Pan, Machado de Assis – Vida e Obra, e etc. Para ouvir, clique aqui. LibriVox https://librivox.org/ https://librivox.org/search?primary_key=52&search_category=language&searc h_page=1&search_form=get_results https://librivox.org/search?primary_key=0&search_category=author&search_ page=1&search_form=get_results O site disponibiliza, gratuitamente, versões faladas de obras literárias que já estão em domínio público. É só acessar aqui. Companhia das Letras https://www.blogdacompanhia.com.br/conteudos/visualizar/E-booksgratuitos-Leia-Em-Casa https://www.blogdacompanhia.com.br/conteudos/visualizar/E-booksgratuitos Entre os dez e-books disponibilizados gratuitamente pelo grupo, estão os títulos “Reinações de Narizinho”, de Monteiro Lobato, e “Viagem ao centro da Terra”, de Júlio Verne. As versões digitais podem ser lidas no celular, tablet, computador ou ereaders. Editora L&PM https://www.lpm.com.br/site/ Além de oferecer todos os livros digitais do catálogo com 30% de desconto, a editora inaugurou a promoção de um eBook grátis por dia. Entre os títulos disponíveis, os clássicos infantojuvenis “Alice no País das Maravilhas”, de Lewis Carroll, e “Viagem ao Centro da Terra”, de Júlio Verne. Conheça 8 sites que disponibilizam obras literárias gratuitamente https://www.sesc-sc.com.br/blog/cultura/conheca-8-sites-que-disponibilizam-obrasliterarias-gratuitamente&#8211; A nossa equipe de Biblioteca indica plataformas digitais para acesso online a livros infantis, infantojuvenis e adultos. Também informa sobre editoras que estão oferecendo frete gratuito para envio de publicações. Confira! Ler é viajar sem sair do lugar! Em tempos de quarentena, a literatura é uma aliada para crianças e adultos poderem atravessar esse período de forma mais leve, na boa companhia dos livros. E para você e sua família poder mergulhar neste universo sem sair de casa, a equipe da Rede de Bibliotecas Sesc-SC reuniu 8 sites que disponibilizam obras literárias para download gratuito e audiolivros, além de jogos e brincadeiras literárias. Também informa sobre editoras que estão oferecendo frete gratuito para envio de livros, para quem quiser renovar o acervo pessoal. Aproveitamos para um convite literário especial: acompanhe as contações de histórias feitas diariamente pelos técnicos de cultura e biblioteca da Instituição, nas redes sociais do Sesc-SC (Instagram @Sesc-SC e Facebook/SescSC), com a ação “Hora da Leitura”. Confira e aproveite! 1 – Plataforma da Autibook https://www.autibooks.com/ • O site está disponibilizando 10 títulos em audiolivro de histórias clássicas infantis. Para ouvir as histórias, basta escolher o livro e adicioná-lo ao carrinho. Ao efetuar a compra, adicione o cupom vamos ajudar e será baixado sem custo. Clique aqui para acessar 2 – Itaú Cultural https://www.euleioparaumacrianca.com.br/estante-digital/ • Na plataforma “Leia para uma criança”, o Itaú Cultural disponibiliza uma série de livros infantis que cabem na bolsa, no bolso e até na palma da mão: fica dentro do celular. São 13 histórias em formato digital com animação que dá para ler a qualquer momento e em qualquer lugar. Clique aqui para acessar 3 – Fundação Educar Dpaschoal http://www.educardpaschoal.org.br/projeto.php?id=4&page=74 • O projeto ‘Leia Comigo’ edita, publica e distribui gratuitamente literatura infantojuvenil por todo o Brasil. Neste momento, disponibiliza para download 15 livros em formato digital com acesso gratuito. Clique aqui para acessar 4 – Portal Domínio Público • São 21 títulos infantis em PDF que estão permanentemente disponíveis para leitura em uma plataforma disponibilizada pelo Governo Federal. Clique aqui para acessar http://www.dominiopublico.gov.br/pesquisa/ResultadoPesquisaObraForm.do?first=50&skip=0&d s_titulo=&co_autor=&no_autor=&co_categoria=33&pagina=1&select_action=Submit&co_midia =2&co_obra=&co_idioma=1&colunaOrdenar=null&ordem=null • Também disponibiliza obras clássicas de domínio público para jovens e adultos, como obras de Machado de Assis, Fernando Pessoa, entre outros autores. Clique aqui para acessar http://www.dominiopublico.gov.br/pesquisa/PesquisaObraForm.jsp 5 – Biblioteca Brasiliana Guita e José Mindlin • Várias obras da literatura podem ser acessadas graças à digitalização de parte do acervo da Biblioteca Brasiliana Guita e José Mindlin. Os usuários podem realizar buscas online, visualizar as obras digitalizadas e também fazer download. Clique aqui para acessar https://digital.bbm.usp.br/handle/bbm/1 6 – Many Books • Títulos clássicos e atuais podem ser encontrados na Many Books. Há livros de domínio público e de direitos autorais exclusivamente concedidos para essa plataforma. O site é em inglês, mas é possível encontrar várias obras em português. Clique aqui para acessar https://manybooks.net/search-book?language%5Bpt%5D=pt 7 – Instagram – @independente.de.tudo Livros gratuitos de editoras independentes estão sendo disponibilizados nesta conta no Instagram criada especialmente para esse momento em que não conseguimos sair de casa. Para ter acesso é necessário cadastrar e-mail e quais títulos deseja receber. Clique aqui para acessar https://www.instagram.com/independente.de.tudo/?igshid=uxkltmucfqz 1 8 – Editora Zazie Edições Vários títulos contemporâneos da editora independente Zazie Edições, sem fins lucrativos, sediada em Copenhague e guiada por princípios de acesso livre (open access) podem ser acessados gratuitamente. Clique aqui para acessar http://zazie.com.br/pequena-biblioteca-de-ensaios/ • Dicas: Editoras com serviço de entrega gratuita Lembrando que neste período também algumas editoras e livrarias mantém seu serviço de compra e com entrega gratuita de livros, o que pode ajudar muito a manutenção destes estabelecimentos. Uma dica é escolher autores catarinenses que podem ser encontrados nos sites abaixo: • Editora Caiaponte https://pt.wikipedia.org/wiki/Caiaponte https://www.instagram.com/caiaponte/ https://www.caiaponte.com/ https://marcelolabes.wordpress.com/portfolio/textos/ • Editora Patuá https://www.editorapatua.com.br/ • Livraria Livros e Livros com entrega gratuita na região de Florianópolis https://www.livroselivros.com.br/ • Dicas de interação online com a literatura no blog da “A Taba” https://blog.ataba.com.br/jogo-gratis/ https://blog.ataba.com.br/10-melhores-brincadeiras-para-criancas-de-0-a-10-anos/ • 10 ideias de brincadeiras literárias divertidas que exploram as histórias e a linguagem do blog da “A Taba”. • Jogo de tabuleiro “Que história é essa?”. Para baixar o jogo é preciso informar o e-mail para qual você deseja que o jogo seja enviado. Esperamos que tenham gostado! Com a colaboração de Marilaine Hahn, Analista de Programação Social Sesc-SC, Cultura, Departamento Regional Edição: Assessoria de Comunicação e Marketing Sesc-SC (Última atualização: 28/03/2020) Durante quarentena, saiba como encontrar ebooks gratuitos para ler Publicado por Manuela Figueredo em CONSUMO às 9:15 https://blogs.ne10.uol.com.br/mundobit/2020/04/01/durante-quarentenasaiba-como-encontrar-ebooks-gratuitos-para-ler/ Para os fãs ávidos da leitura, uma notícia boa: sites e editoras estão disponibilizando ebooks e títulos digitais gratuitos para quem está isolado em casa na quarentena, em função da pandemia do coronavírus. Para ter acesso aos livros, não é preciso nem ter um gadget específico para isso, como o Kindle (Amazon) ou o Kobo Rakuten. Existem aplicativos tanto para Android quanto iOS que permitem a leitura de vários livros digitais. Também há a possibilidade de ler direto do computador ou notebook, se preferir. Pensando nisso, o Mundo Bit preparou uma lista de opções para você enfrentar esse momento difícil fazendo o que mais gosta. Amazon A empresa já possui uma boa variedade de e-books grátis, inclusive na opção do Kindle Unlimited, em que um valor é pago por mês para ter acesso a todos os títulos disponíveis no pacote. Com a pandemia, mais títulos estão gratuitos. Dentre as obras disponíveis para download estão clássicos da literatura brasileira, como: Macunaíma, Dom Casmurro, O Cortiço, Memórias Póstumas de Brás Cubas, Os Sertões e etc. Você pode acessar e baixar os livros digitais aqui. https://www.amazon.com.br/b/ref=amb_link_1?ie=UTF8&node=6311441011&pf_rd_m=A1 ZZFT5FULY4LN&pf_rd_s=merchandised-search-left6&pf_rd_r=WSG77W8DZ3QQF4DC5QGS&pf_rd_r=WSG77W8DZ3QQF4DC5QGS&pf_rd_t=1 01&pf_rd_p=7a11afd6-4a9a-4732-b658-c1aacd81d574&pf_rd_p=7a11afd6-4a9a-4732- b658-c1aacd81d574&pf_rd_i=6311441011 Kobo Rakuten Concorrente da Amazon Kindle, o Kobo também permite a leitura gratuita de alguns e-books via aplicativo ou pelo seu e-reader próprio, o aparelho chamado Kobo. No site da loja, você encontra de graça títulos de ficção, biografias, romances, culinárias e gastronomia, literatura juvenil, suspense e terror, dentre outros. Livraria Cultura Em plataforma online, o acervo de títulos gratuitos da livraria conta com ebooks, quadrinhos, obras voltadas para áreas profissionais e literatura estrangeira. Para acessar, bastar clicar aqui. Lelivros Com o objetivo de democratizar o acesso a leitura gratuita, o Le Livros é uma plataforma brasileira que reúne milhares títulos de domínio público e de propriedade intelectual de forma totalmente gratuita, dos mais variados gêneros. O site também disponibiliza nos formatos PDF, ePUB (iPad e Kindle) e Mobi (Amazon), além de também disponibilizar um link para você ler online, caso não queira baixar. Open Library A”Biblioteca Aberta” é uma livraria gratuita e colaborativa, que disponibiliza mais de um milhão de e-books gratuitos. Nela, você encontra autores de várias partes do mundo, inclusive brasileiros. A única ressalva é que há um número limite de livros que você pode baixar por vez, dentro de um período de tempo. Para conhecer, clique aqui. Project Gutenberg O Project Gutenberg é uma livraria digital com mais de 60 mil e-books grátis. No site do projeto, você encontra centenas de obras publicadas mundialmente, em diversas línguas. A maior parte do acervo é composta por títulos que já estão em domínio público. Até para a criançada: Turma da Mônica Para os fãs de quadrinhos e, principalmente, as crianças, a editora Mauricio de Sousa Produções liberou um pacote de HQs com 188 gibis para ler gratuitamente. Para acessá-los, baixe o aplicativo Banca da Mônica, disponível para Android e iOS (iPhone e iPad). Os gibis ficarão de graça até o dia 25 de abril de 2020. Kobo Rakuten https://www.kobo.com/br/pt/p/livros-gratis Livraria Cultura https://www3.livrariacultura.com.br/ebooks/gratis?PS=24&O=OrderByPri ceASC Lelivros https://lelivros.love/ Open Library https://openlibrary.org/ Project Gutenberg https://www.gutenberg.org/ Até para a criançada: Turma da Mônica – Editora Maurício de Souza Produções https://turmadamonica.uol.com.br/home/ Livros e gibis digitais gratuitos para incentivar a leitura das crianças Pais podem acessar livros e gibis digitais gratuitos para incentivar as crianças a lerem mais Por Heloisa Scognamiglio -13 de abril de 2020 https://cangurunews.com.br/livros-digitais-gratuitos/ A 5ª edição da pesquisa Retratos da Leitura no Brasil, coordenada pelo Instituto Pró- Livro, mostrou uma queda no número de leitores no país, entre 2015 e 2019. No entanto, na faixa etária de 5 a 10 anos ocorreu o inverso – houve um aumento no número de crianças leitoras, felizmente! Diversos estudos já mostraram que a leitura traz diversos benefícios aos pequenos, não só relacionados ao desenvolvimento da escrita, interpretação de textos e quesitos como imaginação e criatividade, mas também para a sua personalidade, nas relações dentro de casa e em sua vida social. Isso mostra como é importante estimular a leitura nas crianças desde a infância, das mais diferentes formas – seja lendo para elas, seja incentivando que leiam sozinhos, ainda que do seu jeito. Pensando nisso, reunimos livros e gibis digitais que estão disponíveis gratuitamente para você ler com os pequenos. Amazon A Amazon disponibilizou 50 mil livros digitais para baixar gratuitamente. Entre os títulos, há várias obras infantis, por exemplo, clássicos como “Peter Pan” e o “Rei Leão”, e diversas edições de “Harry Potter”. Também há “A Vizinha Antipática que Sabia Matemática”, de Eliana Martins e Suppa, que conta a história de Theo, um menino que não gostava nem um pouco de matemática. Há ainda “Vamos dar a volta ao mundo?”, de Marina Klink, que convida os pequenos a se juntarem à família Klink para conhecer o planeta. Clique aqui para conferir todas as obras disponíveis. https://www.amazon.com.br/s?rh=n%3A6311441011&fs=true&ref=lp_6311 441011_sar PUBLICIDADE Os títulos fazem parte do programa Kindle unlimited que exige uma conta válida na Amazon.com.br e uma forma de pagamento válida vinculada a essa conta. Após os 30 dias de plano promocional, os clientes serão cobrados mensalmente pelo preço regular do programa (R$ 19,90 ao mês),
porém, a renovação automática pode ser desabilitada a qualquer momento.
As obras baixadas na Loja Kindle da Amazon podem ser lidas nos
dispositivos leitores Kindle ou no aplicativo gratuito Kindle, disponível
para computadores, tablets e celulares (Android e iOS).
Leia também – 11 livros para conversar sobre a morte com as criancas
Editora Pulo do Gato
A editora está compartilhando alguns de seus títulos infantis
gratuitamente no Issuu, acompanhados por um pequeno roteiro de
como explorar cada obra. Abaixo, os links para acessar os títulos que
já foram disponibilizados e os manuais de como explorar:
https://issuu.com/pulodogato
Diário de Blumka
E-book | Manual de leitura
Eloísa e os bichos
E-book | Manual de leitura
a

Letras de carvão
E-book | Manual de leitura
‘Amal’
As lojas de aplicativos oferecem opções de livros digitais, aplicativos que
narram histórias com ilustrações e animações. Entre eles, “Amal”, cuja
protagonista é forçada a deixar sua casa na Síria por conta da guerra. A
autora é Carolina Montenegro. A história tem narradores, fundo musical e
ilustrações de Renato Moriconi. Ed. Caixote. Disponível no Google Play e
na App Store.
Leia também – 6 livros que falam sobre abuso sexual (e ajudam a prevenilo)
https://cangurunews.com.br/livro-que-falam-sobre-abuso-sexual/
Laboratório de Educação
Beatriz Cardoso e Andrea Guida Bisognin, esse Laboratório busca
sensibilizar adultos sobre seu papel na aprendizagem das crianças através
de várias plataformas. Uma dessas plataformas é o Espaço de Leitura, que
oferece livros digitais com acompanhamento de jogos, sugestões para os
pais de como aprofundar as histórias e atividades que ajudam no
aprendizado da criança. Os livros podem ser ouvidos, com a opção de
também assistir alguém o lendo. Clique aqui para acessar o site.
http://espacodeleitura.labedu.org.br/
Coleção Kidsbook – Leia para uma criança
O site “Eu leio para uma Criança”, iniciativa do banco Itaú, tem vários
livros digitais com ilustrações interativas. São títulos de obras
conhecidas, entre as quais “Malala, a menina que queria ir para a
e “O sétimo gato”, de Luis Fernando Verissimo. É possível ler todos no
site, onde as ilustrações se mexem e há sons. Sete deles também estão
disponíveis em PDF, podendo ser acessados mesmo quando não há
conexão. Clique aqui para acessar o site e ver todos os livros
disponíveis. Ou clique nos títulos abaixo para baixar alguns PDFs.
https://www.euleioparaumacrianca.com.br/estante-digital/
As Bonecas da Vó Maria – de Mel Duarte
https://www.euleioparaumacrianca.com.br/estante-digital/as-bonecas-davo-maria/
A Menina das Estrelas – de Tulipa Ruiz
https://www.euleioparaumacrianca.com.br/estante-digital/a-menina-dasestrelas/
O Cabelo da Menina – de Fernanda Takai
https://www.euleioparaumacrianca.com.br/estante-digital/o-cabelo-damenina/
Meu Amigo Robô – de Giselda Laporta Nicolelis
https://www.euleioparaumacrianca.com.br/estante-digital/meu-amigo-robo/

Azizi, o Menino Viajante – de Conceição Evaristo
O Menino e o Foguete – de Marcelo Rubens Paiva
https://www.euleioparaumacrianca.com.br/estante-digital/o-menino-e-ofoguete/
A Canção dos Pássaros – de Zeca Baleiro
Leia também – Deixar crianças em frente às telas pode não ser tão ruim,
dizem especialistas
Coleção sobre mulheres empreendedoras
Para comemorar o mês da mulher, em março, a fintech SumUp lançou
três livros digitais ilustrados com relatos sobre mulheres
empreendedoras – “A menina que construía”, “Vendedora de sonhos”
e “A mãe que fazia mágica”. As narrativas são baseadas em histórias
reais e trazem um incentivo para a sociedade repensar a importância
das mulheres no mercado de trabalho. Clique aqui para acessar os
títulos, que seguem disponíveis para ser baixados ou lidos online.
Gibis da Turma da Mônica
Grandes aliados dos pais para despertar o gosto pela leitura nos pequenos,
os gibis também têm representantes digitais nesta quarentena. O aplicativo
Banca da Mônica, que traz diversos quadrinhos digitais com as
aventuras da Mônica e seus amigos, oferece algumas edições gratuitas
para baixar na categoria “edições especiais”, entre as quais, “Trabalho
infantil, nem de brincadeira!”. O app funciona por assinatura, mas esse
pacote de gibis pode ser lido inteiramente grátis. É só baixá-lo no
Android ou no iOS.
Leia também: Jogos e livros podem ser usados para desenvolver
34 Melhores Sites para Baixar E-books Grátis
https://livrariapublica.com.br/34-melhores-sites-para-baixar-e-booksgratis/#:~:text=%20Melhores%20sites%20para%20baixar%20ebooks%3A%20%201,enorme%20cole%C3%A7%C3%A3o…%204%20eBooks%20Brasil.%20%2
0More%20
Olá visitante, procurando e-books grátis?
Embora muitos leitores ainda prefiram livros físicos, os e-books possuem a
vantagem definitiva de serem fáceis de levar aonde quer que você vá. Além
disso, você tem muitas maneiras de adquirir e-books gratuitos, especialmente
aqueles que se encontram em domínio público.
Melhores sites para baixar e-books:
Aqui está nossa lista dos 34 melhores sites para baixar ebooks gratuitos para
ler no Kindle, Kobo, tablet, computador, smartphone e outros dispositivos
mobiles.

1. Projeto Gutenberg
O Project Gutenberg oferece mais de 57.000 eBooks gratuitos de domínio
público. É gratuito para ler e redistribuir. Não há taxas e nem aplicativos
personalizados necessários. Você não encontrará os mais recentes best-sellers
no Project Gutenberg, mas encontrará muitos livros clássicos excelentes
disponíveis 24 horas por dia, 7 dias por semana, sem nenhum custo.
https://www.gutenberg.org/browse/languages/pt
https://www.gutenberg.org/
2. Open Library
Open Library é uma biblioteca digital, colaborativa, aberta, disponível ao público
sem fins lucrativos. Com um grande acervo de livros grátis.
https://openlibrary.org/
O Google Livros oferece a opção de acessar livros gratuitos da enorme coleção
que apresenta centenas de clássicos e best-sellers contemporâneos.
GluZ19mcmVlEAcYAQ%3D%3D:S:ANO1ljKuey8&gsr=ChkKFwoVCg90b3BzZWxsaW5nX2ZyZW
UQBxgB:S:ANO1ljIbX7M
4. Amazon eBooks Gratuitos

O Amazon eBooks Gratuitos oferece os melhores livros gratuitos que estão

1. Internet Archive
Internet Archive oferece mais de 15 milhões de livros e textos para download.
Eles também incentivam a comunidade global a contribuir com itens físicos,
bem como a enviar materiais digitais diretamente para o Internet Archive.
https://archive.org/details/texts
2. ManyBooks

O ManyBooks possui uma seleção com mais de 50.000 livros modernos e
clássicos esperando para serem descobertos. Tudo gratuito e disponível na
https://manybooks.net/

1. BookBoon
BookBoon é a maior editora mundial de literatura educacional online. Eles
oferecem mais de 1000 e-books grátis para você baixar.
https://bookboon.com/
2. LibGen/Library Genesis
LibGen é um mecanismo de busca que ajuda você a baixar livros e artigos
http://libgen.rs/
3. FreeBookSpot
O FreeBookSpot é uma biblioteca de
links de e-books onde você pode encontrar e baixar livros gratuitos em mais
de 90 categorias. Você gosta de e-books? Este é o lugar para você!
http://www.freebookspot.club/default.aspx
4. Free eBooks
O site Free eBooks oferece milhares de e-books grátis para você ler onde
quiser. Eles apresentam algumas das melhores categorias, como ficção, não
ficção, romance, ficção científica, autoajuda e negócios.
https://www.free-ebooks.net/
5. LibriVox
em todo o mundo.
https://librivox.org/
6. GetFreeEBooks
GetFreeEBooks apresenta diferentes categorias, como Ficção, Ficção
Científica, Fantasia, Histórias Curtas, Terror e muito mais.
https://www.getfreeebooks.com/
7. FreeComputerBooks
FreeComputerBooks possui livros grátis de várias categorias como informática,
matemática, programação, tutoriais e outros livros técnicos.
https://freecomputerbooks.com/
8. Baen
Baen apresenta categorias como ficção científica e fantasia. Os livros podem
ser baixados em um arquivo zip. Você precisa de um aplicativo como WinRar
para descompactar.
https://www.baen.com/catalog/category/view/s/free-library/id/2012
9. KnowFree
KnowFree é o site para os profissionais acessarem gratuitamente pesquisas,
relatórios, revistas, e-books e tutoriais técnicos.
10. Open Culture
O Open Culture é um site que oferece 800 eBooks gratuitos para seu Kindle,
categorias que incluem grandes obras de não ficção, ficção e poesia.
https://www.openculture.com/free_ebooks
11. BookYards
BookYards fornece materiais educacionais, informações, documentos,
materiais de referência e conteúdo gratuito para qualquer pessoa.
https://www.bookyards.com/en/welcome
12. FeedBooks
O FeedBooks um serviço de biblioteca digital e publicação em nuvem onde
você vai descobrir milhares de ebooks, incluindo novos lançamentos, e uma
coleção de livros gratuitos em domínio público, para ler em qualquer
dispositivo mobile.
https://www.feedbooks.com/
13. The Online Books Page
The Online Books Page é um site que dá acesso a livros que podem ser lidos
gratuitamente na Internet. Possui um acervo com mais de 2 milhões de livros
gratuitos.
https://onlinebooks.library.upenn.edu/
14. PDF Books World
Com o PDF Books World, você pode ler online, ou baixar cópias em PDF de alta
qualidade de livros em domínio público.
https://www.pdfbooksworld.com/books
15. International Children’s Digital Library
Acesse contos de todo o mundo na International Children’s Digital Library .
http://en.childrenslibrary.org/
16. Wikibooks
O Wikibooks é uma coleção de livros didáticos de código aberto que, no
espírito da Wikipedia, os voluntários podem editar.
https://en.wikibooks.org/wiki/Main_Page
17. Planet eBook
Planet eBook é uma grande fonte de literatura clássica em formato de e-book.
https://www.planetebook.com/
18. Portal Domínio Público
O Portal Domínio Público é uma biblioteca virtual criada para divulgar clássicos
http://www.dominiopublico.gov.br/pesquisa/PesquisaObraForm.jsp
19. Global Grey Ebooks
gratuito e, claro, sem DRM. Epubs, ebooks Kindle e PDFs gratuitos. Baixe da
biblioteca Global Grey de mais de 3.000 e-books de qualidade. Não é
necessário registro ou pagamento.
https://www.globalgreyebooks.com/index.html
20. Brasiliana
O site Biblioteca Brasiliana, da Universidade de São Paulo (USP) disponibiliza
cerca de 3 mil livros para download de forma legal. Há livros raros e
documentos históricos, manuscritos e imagens.
https://digital.bbm.usp.br/handle/bbm/1
21. Biblioteca Digital de Obras Raras
O site Biblioteca Digital de Obras Raras idealizado pela Universidade de São
completas em diferentes idiomas.
https://obrasraras.usp.br/
22. Biblioteca Nacional de Portugal
Entre os destaques do portal Biblioteca Nacional de Portugal está um site
dedicado ao escritor José Saramago. Nele estão disponíveis manuscritos do
autor.
http://www.bnportugal.pt/
Criado pelo MEC, o site Machado de Assis disponibiliza a obra completa do
escritor – em pdf ou html – para leitura online. Estão lá crônicas, romances,
contos, poesias, peças de teatro, críticas e traduções.
24. Biblioteca Digital Mundial
Biblioteca Digital Mundial oferece milhares de documentos históricos de
diferentes partes do mundo. Multilingue, o material está disponível para
leitura online.
https://www.wdl.org/pt/
25. eBooks Brasil
eBooks Brasil é um conhecido site brasileiro que oferece livros eletrônicos
gratuitamente em diversos formatos.
https://ebooksbrasil.org/
26. Monkey Pen
O site Monkey Pen disponibiliza uma coleção de livros infantis gratuitos que
https://monkeypen.com/pages/free-childrens-books
27. Kids World Fun
Kids World Fun é outro site que disponibiliza muitos livros infantis em PDF. Os
livros são coloridos e de fácil leitura. Ótimos para as crianças aprenderem
inglês.
https://www.kidsworldfun.com/ebooks.php
28. Livronautas
Livronautas é um site sem fins empresariais, desenvolvido para internautas
que gostem de leitura.
Se você gostou desta postagem, veja abaixo alguns outros artigos que você pode adorar.
https://www.livronautas.com.br/livros/e-books-disponiveis

Informações relevantes relacionadas à leitura de livros e seus aspectos
Parte 2
Dados obtidos pela Internet no dia 13/06/2021
Rodrigo Nunes Cal
Escola Municipal Dr. Cenobelino de Barros Serra
São José do Rio Preto – SP

O BaixeLivros, além de ser uma biblioteca virtual que busca oferecer um maior
relacionamento entre o leitor e o autor, é também uma rede social acadêmica, que
pretende aproximar o aluno do professor, no ambiente online. Com o objetivo de fomentar
a leitura das obras clássicas, promovendo o compartilhamento de ideias, construindo uma
ponte que liga o presente com o passado. Para assim, aperfeiçoar a moral e a cultura do
país.
Literatura Infantil – Livros Digitais infantis (e-Books) para crianças -> Idade: de 0 a 2 anos;
de 3 a 5 anos; de 6 a 8 anos; de 9 a 12 anos https://www.baixelivros.com.br/literaturainfantil

Livros Didáticos – https://www.baixelivros.com.br/didaticos
Livros de Cursos (Capacitação Profissional) –
https://www.baixelivros.com.br/biblioteca/cursos
Livros Religiosos – https://www.baixelivros.com.br/biblioteca/religiao
Livros de Receitas – https://www.baixelivros.com.br/biblioteca/culinaria
Livros de Romance – https://www.baixelivros.com.br/biblioteca/romance
Livros de Ação e Aventura – https://www.baixelivros.com.br/biblioteca/aventura
Livros de Ciências Exatas – https://www.baixelivros.com.br/biblioteca/ciencias-exatas
Livros de Ciências Humanas e Sociais – https://www.baixelivros.com.br/ciencias-humanase-sociais
Acervo Digital (Livros de Ciências Humanas e Sociais) – Livros de História
https://www.baixelivros.com.br/historia
Livros de Filosofia – https://www.baixelivros.com.br/biblioteca/ciencias-humanas-esociais/filosofia
Livros de Direito – https://www.baixelivros.com.br/biblioteca/ciencias-humanas-esociais/direito
Livros de Artes – https://www.baixelivros.com.br/biblioteca/ciencias-humanas-esociais/artes
Livros de Geografia – https://www.baixelivros.com.br/biblioteca/ciencias-humanas-esociais/geografia
Livros de Idiomas – https://www.baixelivros.com.br/biblioteca/ciencias-humanas-esociais/idiomas
Livros de Marketing – https://www.baixelivros.com.br/biblioteca/ciencias-humanas-esociais/markenting
Livros de Sociologia – https://www.baixelivros.com.br/biblioteca/ciencias-humanas-esociais/sociologia
Livros de Ciências Biológicas – https://www.baixelivros.com.br/biblioteca/biologicas-esaude

Livros da Literatura de Cordel – https://www.baixelivros.com.br/biblioteca/literatura-decordel
Livros da Literatura Brasileira – https://www.baixelivros.com.br/biblioteca/literaturabrasileira
Livros da Literatura Estrangeira – https://www.baixelivros.com.br/biblioteca/literaturaestrangeira
Livros mais baixados (Destaques da Semana) – https://www.baixelivros.com.br/trending
Quadrinhos online pra ler de graça
A quadrinista Ana Paloma Silva indica HQs inéditas, semanais e gratuitas para
ler na quarentena
Biblioteca Monteiro Lobato
Biblioteca Monteiro Lobato
Biblioteca Monteiro Lobato
Acervo geral
A biblioteca conta com um acervo de 68 mil exemplares que é constituído por livros de
literatura e informação, revistas, multimídia, etc.
Todo o acervo de livros pode ser encontrado no catálogo online do Sistema Municipal de
Bibliotecas. Informe-se sobre outros títulos existentes, pessoalmente ou por telefone.
A maior parte das obras pode ser emprestada ao usuário matriculado na biblioteca; veja aqui
como se inscrever.
Gibiteca
álbuns, gibis, mangás e RPG.

A Gibiteca Henfil do Centro Cultural São Paulo possui uma coleção de mais de 10 mil títulos
entre quadrinhos, fanzines, periódicos e livros sobre história em quadrinhos. É ponto de
convivência para fãs e profissionais da área, que lá se reúnem, durante os finais de semana, para
desenhar, trocar experiências ou confeccionar fanzines. O acervo está disponível para consulta e
parte dele pode ser emprestado mediante matrícula. Mantém atividades permanentes como feira
de troca de gibis, exibição de vídeos de animação japonesa, RPG, palestras e oficinas de
120 mil livros e documentos http://centrocultural.sp.gov.br/bibliotecas/#gibitecahenfil
Bibliotecas e Coleções Especiais
O conjunto de Bibliotecas do CCSP é frequentado por cerca de mil pessoas a cada
dia. Além de acervos somando aproximadamente 120 mil livros e documentos,
também oferece espaços públicos destinados ao estudo e ao convívio.
Biblioteca Pública Municipal Louis Braille
Planejada e equipada para atender pessoas com deficiência visual, reúne cerca de cinco mil
títulos, entre livros em braille e audiolivros, além de computadores e scanners com
Horário de funcionamento:
Terça a sexta, das 10h às 19h
(A entrada é permitida até 30 minutos antes do fechamento)
Contato:
3397-4088
bibliotecabraille@prefeitura.sp.gov.br
Biblioteca Pública Municipal Sérgio Milliet
Segunda maior biblioteca pública da cidade de São Paulo, é constituída por um acervo
multidisciplinar com mais de 110 mil títulos.
Horário de funcionamento:
Terça a sexta, das 10h às 20h
Sábados e domingos, das 10h às 18h

(A entrada é permitida até 30 minutos antes do fechamento)
Contato:
3397-4003
bibliotecaccsp@prefeitura.sp.gov.br
Saiba mais
Coleção de Artes Alfredo Volpi
A Alfredo Volpi é a seção da Biblioteca Sérgio Milliet que abriga o acervo de artes do CCSP,
proveniente da Biblioteca Mário de Andrade e do IDART (Departamento de Informação e
Documentação Artísticas).

Horário de funcionamento:
Terça a sexta, das 10h às 20h
Sábados e domingos, das 10h às 18h
(A entrada é permitida até 30 minutos antes do fechamento)
Contato:
3397-4087
bibliotecaccsp@prefeitura.sp.gov.br
Gibiteca Henfil
Coleção com mais de 10 mil títulos entre álbuns de quadrinhos, gibis, periódicos e livros
sobre HQ.
Horário de funcionamento:
Terça a sexta, das 10h às 20h
Sábados e domingos, das 10h às 18h
(A entrada é permitida até 30 minutos antes do fechamento)
Contato:
3397-4090
gibiteca@prefeitura.sp.gov.br
http://centrocultural.sp.gov.br/bibliotecas/#gibitecahenfil
Rua Vergueiro, 1000 (Estação Vergueiro do Metrô)
Paraíso – 01504-000 – São Paulo, SP
Tel. : 11 3397-4090
E-mail: gibiteca@prefeitura.sp.gov.br
Horário: 3ª a 6ª feira das 10h às 20h; sábados e domingos das 10h às 18h
(A entrada é permitida até 30 minutos antes do fechamento)

Sala de Leitura Infantojuvenil
A Sala de Leitura Infantojuvenil é um espaço de convivência voltado à formação de
leitores.
Horário de funcionamento:
Terça a sexta, das 10h às 20h
Sábados e domingos, das 10h às 18h
domingos
(A entrada é permitida até 30 minutos antes do fechamento)
Contato:
3397-4083
bibliotecaccsp@prefeitura.sp.gov.br
Literatura Infantil e Juvenil – coleção de obras especializadas
infanto-juvenis (necessário agendar previamente).
• A Seção de Bibliografia e Documentação da Biblioteca Monteiro Lobato conta com um dos
mais importantes acervos do país em literatura infantil e juvenil. É responsável pela
publicação da Bibliografia Brasileira de Literatura Infantil e Juvenil desde 1941.
• Coleção de obras raras: em literatura infantil nacional e estrangeira e acervo referente à
vida e obra de Monteiro Lobato; com cerca de 4500 mil itens e é basicamente formado por
doações da família do escritor: livros, fotografias, mobiliário, objetos pessoais e
correspondências.
Livros Escolares
Acervo Histórico de Livros Escolares – AHLE. A partir do material encontrado em bibliotecas
infantis, foram selecionadas cartilhas, manuais de ensino e obras didáticas publicadas desde

1. Conjunto de livros que contempla disciplinas escolares dos cursos elementar e secundário.

Memória Documental
O arquivo histórico-documental da biblioteca reúne a história do departamento das bibliotecas
infanto-juvenis com documentos e fotos do Timol, Tibbim, Turistinhas Municipais, Academia
Juvenil de Letras e o Jornal A Voz da Infância.
Para mais informações de gestão e de dados estatísticos desta biblioteca e outras unidades
da Coordenadoria do Sistema Municipal de Bibliotecas, e em conformidade com a Lei de
Acesso à Informação 12.527 de 18/11/2011, acesse Biblioteca em Números.
SÃO PAULO
Nesta segunda-feira (12), é celebrado o Dia do Bibliotecário. A data escolhida para a
comemoração é uma homenagem ao bibliotecário, publicitário e poeta Manuel Bastos
Tigre (1882-1957), que nasceu em 12 de março.
Para celebrar os profissionais da área, o Guia montou uma lista com seis bibliotecas
paulistanas com acervos que prometem agradar os fãs de quadrinhos. Confira abaixo
sugestões de onde encontrar gibis e HQs em bibliotecas da cidade.
Fundada em 1925, a biblioteca ganhou, em agosto de 2016, uma sala dedicada
aos fãs de quadrinhos. No espaço é possível encontrar pilhas de gibis de heróis
e da Turma da Mônica, além de HQs infantis, bibliográficas e eróticas, por
exemplo.
R. da Consolação, 94, Consolação, região central, tel. 3775-0002. Seg. a sex.:
8h às 22h. Sáb. e dom.: 8h às 20h
Biblioteca de São Paulo
Localizada no terreno que abrigava a penitenciária do Carandiru, tem catálogo
pequeno, mas diverso –de HQs underground às de aventura. Um dos focos
são as publicações de humor.
Pq. da Juventude – Av. Cruzeiro do Sul, 2.630, Canindé, região norte, tel. 2089-

1. Ter. a dom.: 9h30 às 18h30.
Centro Cultural da Juventude
O centro cultural dedicado à juventude tem programação com shows, oficinas
e exposições, além de uma biblioteca. Nela, os fãs de quadrinhos podem ter
acesso a HQs e mangás.
Av. Deputado Emílio Carlos, 3.641, Vila Nova Cachoeirinha, região norte. Ter.
a sáb.: 10h às 22h. Dom.: 10h às 18h.
Biblioteca e Gibiteca Sesi
Com curadoria de Álvaro de Moya (1930-2017), um dos maiores teóricos
brasileiros de quadrinhos, a gibiteca reúne desde títulos de autores famosos,
como Will Eisner, até álbuns raros, como as coleções da Ebal, editora brasileira
de HQs extinta na década de 1990.
R. Carlos Weber, 835, Vila Leopoldina, região oeste, tel. 3834-5523. Seg.: 9h
às 18h. Ter. a sex.: 7h às 18h. Sáb.: 10h às 16h.
Gibiteca Henfil
A gibiteca, que homenageia o cartunista criador de personagens como a
Graúna, fica no Centro Cultural São Paulo e tem cerca de 10 mil títulos no
acervo, entre álbuns, revistas e periódicos. Nas estantes, dá para encontrar
desde revistas amareladas da década de 1960, como as do “Batman”, até livros
lançados nos últimos anos, como “A Diferença Invisível” (2017). É possível

CCSP – bibliotecas – R. Vergueiro, 1.000, Liberdade, região central, tel. 3397-

1. Ter. a sex.: 10h às 20h. Sáb. e dom.: 10h às 18h.
Biblioteca Monteiro Lobato
Focada no público infantojuvenil, a biblioteca no centro da cidade tem uma
gibiteca com cerca de sete mil títulos, entre álbuns, gibis, mangás e até mesmo
jogos de RPG.
Biblioteca Infantojuvenil Monteiro Lobato – R. Gen. Jardim, 485, Vila Buarque,
região central, tel. 3256-4122. Seg. a sex.: 8h às 18h. Sáb.: 10h às 17h. Dom.:
10h às 14h.
Histórias em quadrinhos – Gibiteca: uma biblioteca só para histórias em
Quadrinhos Para Ler Online – https://pt.calameo.com/accounts/2722857
11 sites para baixar histórias em quadrinhos de graça
As histórias em quadrinhos começaram no Brasil no século XIX,
adotando um estilo satírico conhecido como cartuns, charges ou
caricaturas e que depois se estabeleceria com as populares tiras. A
publicação de revistas próprias no Brasil começou no início do século
XX. HQs têm gostinho de infância, mas não são apenas as crianças que
personagens e suas histórias. Existem alguns sites para se obter gibis
gratuitos.

1 – Marvel.com
Você gostaria de ler famosas histórias em quadrinhos? Se respondeu que sim, então uma
de suas primeiras paradas deve ser o site Marvel.com ‘s. Aqui você vai encontrar o HomemAranha, Homem de Ferro, Os Vingadores, Capitão América, Demolidor e muito mais!
https://www.marvel.com/marvel-comicstore
2 – O Museu Comic Digital
Se você gosta de histórias publicadas há 60 ou 70 anos atrás, este é o site. Este site tem uma
grande seleção de histórias clássicas em quadrinhos de Charlton, Fawcett, Ace, Ajax-Farrell,
Fox Característica Syndicate e muito mais.
https://digitalcomicmuseum.com/index.php
3 – Dark Horse Digital (Indisponível)
Dark Horse Comics tem grandes séries, como Hellboy, Star Wars e Aliens. Sua loja online
exige que você se inscreva gratuitamente para ter acesso livremente.
https://digital.darkhorse.com/
4 – ComiXology
para sua diversão! As seleções livres são atualizadas semanalmente.
https://www.comixology.com/free-comics
5 – DC Nation
DC Comics é o site para encontrar quadrinhos do Superman, Batman, Mulher Maravilha e
Capitão Marvel. O site é um bom local para apresentar os jovens leitores para as maravilhas
desses personagens!

https://www.dckids.com/#comics
6 – Internet Archive
Entre as muitas coisas que eles arquivar para uso público existe uma enorme coleção de
https://archive.org/details/comics
7 – Image Comics (Indisponível)
O site tem algumas das séries cool mais populares das histórias em quadrinhos. Existem
títulos como Witchblade, do Spawn e Savage.
https://imagecomics.com/
8 – ElfQuest.com
O site disponibiliza muitas séries de histórias em quadrinhos da década de 70.
9 – Newsarama.com
O Newsarama é um site que comercializa gibis. Porém existe uma boa seleção de gibis
completos para você ler online.

10 – Drive Thru Comics
O “Drive Thru” tem um monte de histórias em quadrinhos gratuitas produzidos por
grandes editoras.
https://www.drivethrurpg.com/
11 – Comic Book Mais
Este site oferece quadrinhos da era de ouro dos gibis. Existe todo tipo de história para você
baixar de graça.
https://comicbookplus.com/
A maior biblioteca infanto-juvenil do mundo!
A maior biblioteca infanto-juvenil do mundo!
domingo,14 de junho de 2009
http://leconcierge.com.br/blog/kids/209,A-MAIOR-BIBLIOTECAINFANTO-JUVENIL-DO-MUNDOInstalada no castelo Blutenburg em Munique a biblioteca juvenil
internacional guarda a maior coleção de livros de literatura infantil e
juvenil do mundo. A construção do século XV abriga um acervo de mais
de 580 mil títulos, em 130 idiomas, publicados nos últimos 400 anos. Tem
até Monteiro Lobato e Ana Maria Machado!
Internationale Jugendbibliothek – Internationale Jugendbibliothek (ijb.de)
http://leconcierge.com.br/blog/kids/209,A-MAIOR-BIBLIOTECA-INFANTOJUVENIL-DO-MUNDO-

11 MAR
JELLA LEPMAN E A MAIOR BIBLIOTECA INFANTO-JUVENIL DO
MUNDO.
http://www.ecofuturo.org.br/blog/jella-lepman-e-a-maior-biblioteca-infanto-juvenil-do-mundo/
Em meio aos destroços da Segunda Guerra Mundial, cuja atrocidade provocou um enorme
amargor na espécie humana em relação aos próprios homens, despertando a desconfiança
de que a razão por si mesma não pode salvar a humanidade, uma mulher cuidava de
crianças órfãs. Como grande cuidadora que era, ela enxergou na leitura e na literatura a
possibilidade de reconstrução do mundo. Afinal, os homens não passaram ilesos nem
impunes pela Grande Guerra, mas os livros, com toda a potência que carregam,
permaneciam. Se era assim, o caminho seria formar, através dessas obras, futuros homens
e mulheres que um dia teriam nas mãos a possibilidade de dar novas formas ao mundo.

Mas o mundo é de todos – ou melhor, somos todos – e ninguém pode mudá-lo sozinho.
Nem um só livro, nem um só autor, nem uma só língua de um só indivíduo. Assim, Jella
Lepman juntou como pôde toda a prosa e a poética do planeta em um ônibus-biblioteca e
realizou uma exposição internacional de livros infantis, em diversos idiomas, no ano de

1. Estava plantada a semente da Biblioteca Juvenil Internacional de Munique, fundada
em 1949, na Alemanha, sendo a maior biblioteca de literatura infantil e juvenil do mundo.

São cerca de um milhão de títulos, em 120 idiomas. Todos os anos, a Biblioteca seleciona
centenas de livros infanto-juvenis publicados em mais de 70 países. Crianças de todas as
nacionalidades que visitam o acervo encontram literatura de qualidade em seu idioma – e

Aliás, é essa a intenção, pois a construção do mundo é poliglota. Ao trazer livros para junto
de crianças que perderam tudo na guerra, Jella Lepman acreditava no entendimento pelo
encantamento entre diferentes povos e culturas. No encontro com nossas diferenças,
descobrimos ou construímos nosso modo de sermos humanos e por aí é que inventamos a
isso que chamamos humanidade. Este continua sendo o mote dessa grande Biblioteca

Desde 1983, a Biblioteca Juvenil Internacional de Munique está abrigada no Castelo de
Blutenburg, construção de 1435 que, com a chegada do acervo, se tornou conhecida como
“O Castelo dos Livros”. Além da biblioteca, Jella Lepman idealizou também o International
Board on Books for Young People – IBBY, organização internacional sem fins lucrativos para
o livro juvenil. No Brasil, a entidade é representada pela Fundação Nacional do Livro Infantil
e Juvenil – FNLIJ. É criação do IBBY o prêmio internacional Hans Christian Andersen,
considerado o Nobel da Litertura Infanto-juvenil, que já contemplou as autoras brasileiras
Lygia Bojunga (1982) e Ana Maria Machado (2000).
Visite a Biblioteca Monteiro Lobato, o maior acervo de
literatura infantojuvenil do Brasil
Espaço conta com 90 mil exemplares, sendo alguns disponíveis
online
https://www.educacao.sp.gov.br/visite-a-biblioteca-monteiro-lobato-o-maior-acervo-deliteratura-infantojuvenil-do-brasil/
Inaugurada pelo escritor Mario de Andrade, no ano de 1936, quando o
mesmo era diretor do Departamento Municipal de Cultura, a atual
Biblioteca Monteiro Lobato é a mais antiga em território nacional. Foi
criada como Biblioteca Infantil Municipal, mas em 1955 recebeu o nome
do autor, como forma de homenagear um dos principais escritores de
histórias infantis do Brasil.
A biblioteca conta com um acervo de 90 mil exemplares, constituído por
livros de literatura e informação, revistas e material multimídia. E a
maior parte dessas obras também está disponível na internet, por meio
do Catálogo Online. Além de ter um ótimo ambiente para ler as obras,
aqueles que possuem matrícula na instituição ainda podem levar o livro
para casa e devolvê-lo quando finalizar a leitura, que deve ser concluída

dentro de um prazo pré estabelecido.
de 7 mil exemplares de álbuns, mangás, gibis e RPG. Além disso,
podem agendar um horário para ter acesso a obras especializadas.
10 bibliotecas virtuais gratuitas
https://canaldoensino.com.br/blog/10-bibliotecas-virtuais-gratuitas
As bibliotecas virtuais surgem com uma forma de democratizar a informação
em todo o mundo e, por isso são grandes aliadas dos professores. Nelas você
encontra obras que custam caro e que ainda não estão disponíveis na sua
escola ou biblioteca da cidade. Para ajudar na busca, a Revista Nova Escola
preparou uma seleção com 10 bibliotecas virtuais onde o usuário pode baixar
gratuitamente diversos clássicos da literatura, entre outras obras.
Brasiliana USP
http://www.brasiliana.usp.br/
A biblioteca que ainda não tem residência física definitiva na Universidade de São
Paulo já tem uma pequena porcentagem do seu acervo online focado em autores
brasileiros ou obras ligadas à cultura nacional em domínio público. Em destaque, três
volumes com gravuras de Debret durante sua viagem pelo Brasil no século 19, todas as
primeiras edições da obra de Machado de Assis, José de Alencar e Olavo Bilac e a
Coleção da Klaxon, uma das principais revistas do movimento modernista paulistano.
Boa parte das obras raras acompanha textos de apresentação feitos pelos

Domínio Público
http://www.dominiopublico.gov.br/pesquisa/PesquisaObraForm.jsp
Desenvolvida pelo Ministério da Educação, nesta biblioteca são disponibilizados
gratuitamente cerca de 180 mil textos, além de imagens, arquivos de som e vídeo. O
site conta com rico acervo de publicações na área de educação. Lá, você ainda
encontra as obras completas de Machado de Assis e um grande acervo de poesias de
Fernando Pessoa. A biblioteca possui também diversas músicas eruditas brasileiras, e
inúmeros textos de literatura infantil, além dos Compilados sobre Legislação
Educacional.
Biblioteca Nacional
http://www.bn.br/portal/
Um dos maiores acervos do país já tem boa parte da sua versão virtual catalogada,
principalmente na área de periódicos. Apesar de conter um certo número de materiais
literários, o foco da biblioteca catalogada ainda são os mapas, fotografias e
periódicos. Vale conferir a versão original dos Lusíadas, a Bíblia em latim, e a Coleção
Teresa Cristina – uma série de documentos doados ao museu pela mulher de Dom
Pedro II com registros riquíssimos do Império.
Arquivo Público do Estado de São Paulo
Uma excelente opção para aqueles que procuram arquivos históricos relacionados ao
Estado de São Paulo Jornais, o Arquivo Público do Estado traz uma série de revistas,
Fotografias, vídeos e anuários estatísticos. Entre os destaques do acervo está um
conjunto documental com cartas trocadas pelos chefes do movimento sobre a
Revolução de 1924. A seção Memória da Educação é um prato cheio para professores,
apresentando publicações de caráter histórico que nos remetem ao universo escolar
em São Paulo nos séculos 19 e 20.

Acervo Digitais de Cordeis da Biblioteca de Obras Raras de Átila de
Almeida – UFPB
http://cordeis.bc.uepb.edu.br/index.php
Considerando a riqueza da peculiaridade da cultura nordestina, a biblioteca traz cerca
de 9 mil títulos e 15 mil exemplares de cordéis da Biblioteca de Obras Raras Átila
Almeida, cuja mantenedora é a Universidade Estadual da Paraíba (UEPB). A biblioteca
apresenta-se como a maior guardiã no Brasil desse tipo de acervo tanto no que diz
respeito às questões de ordem quantitativa como qualitativa (estado de conservação
e organização das peças).
Biblioteca Digital Paulo Freire
http://www.paulofreire.ce.ufpb.br/paulofreire/principal.jsp
Também pertencente à Universidade Federal da Paraíba, a biblioteca traz um rico
acervo sobre o educador que inclui seus textos, livros, textos didáticos,
correspondências e, inclusive, diversas críticas e análises relacionadas a seu trabalho.
Biblioteca Digital Mundial
Criada pela UNESCO, a Biblioteca Digital Mundial disponibiliza na Internet,
gratuitamente, e em formato multilíngue, importantes de literatura, áudio, mapas e
fotografias provenientes de países e culturas de todo o mundo. As pesquisas podem
ser feitas em sete línguas diferentes e as buscas, feitas por período.
Coleção Aplauso
http://aplauso.imprensaoficial.com.br/
Traz biografias de artistas, cineastas e dramaturgos nacionais; além de roteiros de
cinema, peças de teatro e a história de diversas emissoras de TV. Caso o leitor prefira
a versão impressa, todos eles podem ser encontrados em livrarias de todo o país;

Wikilivros
http://pt.wikibooks.org/wiki/Wikilivros
Projeto da Wikimedia Foudation dedicado ao desenvolvimento colaborativo de livros,
apostilas, manuais e outros textos didáticos de conteúdo livre. São diversos temas em
diversas línguas. Há uma versão da página em português.
Banco de Dados de Livros Escolares Brasileiros (1810 a 2005) – FEUSP
http://www2.fe.usp.br/estrutura/livres/index.htm
Banco de dados que disponibiliza pela internet o acesso à produção das diversas
disciplinas escolares brasileiras desde o século XIX até os dias atuais e fornece
referenciais e fontes. A organização do LIVRES caracteriza-se por ser alimentado e
ampliado constantemente pelas pesquisas de uma equipe de especialistas da área,
que analisam o livro.
Confira 45 bibliotecas virtuais gratuitas no Brasil e no
mundo
Publicado por Educredi em 22 de abril de 2020
As bibliotecas virtuais ganham cada vez mais força e o usuário encontra obras de todos
os tipos, desde clássicos da literatura até trabalhos científicos. Selecionamos 45
bibliotecas que oferecem conteúdo gratuito, confira!

1. Biblioteca Digital Camões
A Biblioteca Digital Camões possui um acervo de mais de 3 mil títulos, com conteúdos
de literatura, linguística, arte, cinema, arquitetura, geografia, história e educação.
Para conferir os conteúdos acesse o site da Biblioteca Digital Camões.

https://www.instituto-camoes.pt/activity/servicos-online/biblioteca-digital
O Camões, I.P. disponibiliza um conjunto de textos e documentos de grande relevância
cultural e linguística na sua Biblioteca Digital, pretendendo, assim, fazer chegar o Português
a um universo cada vez mais amplo de falantes e estudantes.
Possui ainda uma vasta rede de Bibliotecas, integrando a Biblioteca Camões, I.P. (na sede,
em Lisboa), herdeira dos acervos bibliográficos dos antigos IPAD e Instituto Camões, as 17
bibliotecas disponíveis nos Centros Culturais Portugueses (CCP), em diversos países, e as 71
http://bibliotecasicl.pt/Opac/Pages/Search/SimpleSearch.aspx?Database=10
5199_BDC
Camões – Instituto da Cooperação e da Língua, I.P.
Avª da Liberdade, 270 – 1250-149 Lisboa | Telefone: +351 213109100 | Fax: +351 213143987
| Email: geral@camoes.mne.pt

1. Biblioteca Digital Luso-Brasileira
Com mais de 2 milhões de materiais, a Biblioteca Digital Luso-Brasileira reúne obras do
Brasil e de Portugal que estão em domínio público.
Todas essas obras podem ser encontradas na página da biblioteca.
https://bdlb.bn.gov.br/
2. Biblioteca Digital Mundial
A Biblioteca Digital Mundial é um projeto apoiado pela Organização das Nações Unidas
(ONU) com participações de bibliotecas de 193 países. Os materiais culturais e científicos
são dispostos em diferentes línguas para acesso de muitas nações.
Acesse o conteúdo da Biblioteca Digital Mundial aqui.
https://www.wdl.org/pt/
3. Biblioteca Europeana
Biblioteca Europeana envolve mais de 1.600 instituições que puderam criar a biblioteca
digital. A coleção conta com mais de 30 milhões de materiais, se transformando no maior
acervo cultural e científico do mundo.
Entre os materiais estão imagens, pinturas, desenhos, mapas, fotos, livros, jornais, cartas,
diários, vídeos e áudios.
Para conferir todos os títulos entre no site Biblioteca Europeana.
https://www.europeana.eu/pt
4. Coleção Aplauso
A Coleção Aplauso tem como principal colaborador a Imprensa Oficial de São Paulo. O
acervo disponibilizado conta com mais de 550 obras para ler online e baixar em PDF.
Os materiais oferecidos são biografias, roteiros de cinemas, peças de teatro, histórias de
grandes emissoras de televisão, entre outros conteúdos.
Confira todo o acervo no site da Coleção Aplauso.
https://aplauso.imprensaoficial.com.br/lista-livros.php
5. Coleção Brasiliana
A Universidade de São Paulo (USP), dispõe por meio da Biblioteca Brasiliana cerca de 1500
livros e, também, manuscritos, documentos históricos e imagens.
Acesse os materiais disponíveis na plataforma da Coleção Brasiliana.
https://digital.bbm.usp.br/handle/bbm/1
A Coleção Biblioteca do Senado é inteiramente disponibilizada na Livraria Virtual do
Senado. É possível encontrar 125 livros gratuitos em diversas áreas, como literatura,
jornalismos, crítica literária, fotografia, arquitetura, história, direito, política e biografia.
Todo o material está disposto na página da Coleção Biblioteca do Senado.
A grande parceria entre o Ministério da Educação, Unesco e Fundação Joaquim Nabuco,
apresentam uma coleção com 62 livros sobre personalidades da educação na Coleção
Veja todo esse material aqui.
kip=0&ds_titulo=&co_autor=&no_autor=&co_categoria=133&pagina=1&select_action=Su
bmit&co_midia=2&co_obra=&co_idioma=&colunaOrdenar=DS_TITULO&ordem=null
8. Coleção Fernando Pessoa
Fernando pessoa, o grande poeta português, possui importantes obras literárias. O Portal
Domínio Público disponibiliza todo o acervo de Fernando Pessoa e seus heterônimos.
Confira todas as obras da Coleção Fernando Pessoa no site.
kip=0&ds_titulo=&co_autor=&no_autor=Fernando%20Pessoa&co_categoria=2&pagina=1
&select_action=Submit&co_midia=2&co_obra=&co_idioma=&colunaOrdenar=DS_TITULO
&ordem=null
Machado de Assis foi um dos maiores escritores brasileiros. Por isso, a Universidade
Federal de Santa Catarina e o Portal Domínio Público realizam uma parceria para oferecer
todo o acervo do autor para os leitores.
Basta acessar o site da Coleção Machado de Assis e escolher algumas de suas obras.
10. Coleção Open Library
A Coleção Open Library reúne mais de 3 milhões de livros, em diferentes línguas, sendo a
maioria na língua inglesa. Essa coleção é uma iniciativa desenvolvida pelo Internet Archive
e pela Fundação Austin com o objetivo de democratização da leitura.
Os materiais estão todos dispostos na página da Coleção Open Library.
https://openlibrary.org/subjects/accessible_book#ebooks=true
Essa coleção é um projeto da SciELO Livros, que apresenta o propósito de publicar coleções
de livros de caráter científico de forma online. São disponibilizados mais de 500 livros para
Para conferir esses materiais, acesse a página da Coleção SciELO Livros
http://search.livros.scielo.org/search/?output=site&lang=pt&from=181&sort
=&format=summary&count=20&fb=&page=10&q=&index=tw&where=BOOK
&filter[is_comercial_filter][]=f
12. Portal Domínio Público
Portal Domínio Público possui um acervo com mais de 200 mil títulos,
representando a maior biblioteca virtual brasileira. As obras presentes no
portal são materiais autorizados ou que se encontram em domínio público.
O acervo está disponibilizado no Portal Domínio Público.
http://www.dominiopublico.gov.br/pesquisa/PesquisaObraForm.jsp
13. Projeto Gutenberg
O Projeto Gutenberg é a biblioteca digital mais antiga do mundo, criada em
14. O acervo é composto por aproximadamente 40 mil títulos.
Os livros podem ser encontrados no portal Projeto Gutenberg.
https://www.gutenberg.org/wiki/PT_Principal
15 – Biblioteca UEPB – Acervo Digitais de Cordéis da Biblioteca de Obras Raras de Átila de
Almeida – http://biblioteca.uepb.edu.br/
17 – FEUSP – Banco de Dados de Livros Escolares – http://www2.fe.usp.br:8080/livres/
18 – Guia de Educação – https://canaldoensino.com.br/blog/category/livros-gratis
19 – Biblioteca Nacional da Universidade de Córdoba – https://rdu.unc.edu.ar/
20 – Biblioteca da Universidade Nacional de La Plata –
http://www.memoria.fahce.unlp.edu.ar/
21 – BDTD – Biblioteca Digital Brasileira de Teses e Dissertações
22 – Biblioteca Digital da Escola de Música da UFRJ –
23 – Biblioteca Digital da UNESP – https://bibdig.biblioteca.unesp.br/
24 – Biblioteca Digital da Unicamp – http://www.bibliotecadigital.unicamp.br/
25 – Biblioteca Digital Del Patrimônio Ibero-americano
26 – Biblioteca Digital do Supremo Tribunal Federal –
27 – Biblioteca Digital Paulo Freire
28 – Biblioteca Mundial Digital – https://www.wdl.org/pt/
29 – Biblioteca Nacional da Colômbia – https://bibliotecanacional.gov.co/es-co
30 – Biblioteca Nacional Digital Brasil – http://bndigital.bn.gov.br/
31 – Biblioteca Nacional Digital de Portugal – https://purl.pt/index/geral/PT/index.html
32 – Biblioteca On-line SEBRAE – https://bis.sebrae.com.br/bis/

33 – Biblioteca Virtual do Rio Grande do Sul – http://www.bibvirtual.rs.gov.br/
34 – Biblioteca Virtual nas áreas de Biblioteconomia e Ciência da Informação – http://bibci.wikidot.com/
35 – Digital Library Federation – https://www.diglib.org/
36 – Latindex – https://www.latindex.org/latindex/inicio
37 – Library of Congress – National Digital Library – https://www.loc.gov/
38 – LivRre! – Portal de Periódicos – http://antigo.cnen.gov.br/centro-de-informacoesnucleares/livre
39 – Bibliomania – http://www.bibliomania.com/
40 – Public Library of Science – https://plos.org/
41 – The Free Library – https://www.thefreelibrary.com/
42 – The On-line Books – http://onlinebooks.library.upenn.edu/
43 – Universia Livros – https://www.universia.net/br/actualidad
44 – Virtual Books – http://www.virtualbooks.com.br/editora/
45 – Wikilivros – https://pt.wikibooks.org/wiki/Wikilivros:P%C3%A1gina_principal
Conheça 12 Bibliotecas Digitais de Ciências Humanas
Conheça 12 Bibliotecas Digitais de Ciências Humanas
https://sistema.bibliotecas-rj.fgv.br/noticias/conheca-12-bibliotecas-digitais-de-cienciashumanas
Conheça abaixo 12 Bibliotecas Digitais de Ciencias Humanas que podem auxiliar a sua pesquisa!
• Bibliotecas Virtuais Temáticas – PROSSIGA
http://prossiga.ibict.br
Coleções referenciais que reúnem e organizam informações, presentes na internet, sobre determinadas
áreas do conhecimento. Elas são desenvolvidas por meio da parceria do IBICT – Instituto Brasileiro de
Informação em Ciência e Tecnologia com Instituições que desejam organizar e difundir seus conteúdos
temáticos na Internet.

• Acervo Histórico da Assembleia Legislativa de São Paulo
http://www.al.sp.gov.br/web/acervo2/index_acervo.htm
O setor do Acervo História da Assembleia Legislativa de São Paulo disponibiliza para consulta mais de 350
mil página de documentos originais relacionados à história do legislativo paulista
• Biblioteca Digital do Senado Federal
O acervo digital é variado, dividindo-se entre livros, obras raras, artigos de revista, notícias de jornal,
produção intelectual de senadores e servidores do Senado Federal, legislação em texto e áudio, entre
outros documentos.
• Biblioteca Virtual da América Latina
http://www.bvmemorial.fapesp.br
Desenvolvida pela Fundação Memorial da América Latina, localizada na cidade de São Paulo, com apoios
da FAPESP, tem como objetivo disseminar informação e conhecimento sobre a América Latina, nos
aspectos das humanidades, ciências e artes produzidos pelo Memorial da América Latina.
• Biblioteca Virtual de Direitos Humanos da USP
http://www.direitoshumanos.usp.br
Criada pela Comissão de Direitos Humanos da Universidade de São Paulo. Oferece acesso a materiais
relativos à defesa e à promoção dos Direitos Humanos no Brasil.
• Biblioteca On Line do SEBRAE
http://www.biblioteca.sebrae.com.br
Espaço aberto à construção e compartilhamento do conhecimento, que visa contribuir para o contínuo
aprendizado do empreendedorismo, auxiliando o desenvolvimento e o fortalecimento dos pequenos
negócios.
• Centro de Pesquisa e Documentação de História Contemporânea do Brasil da FGV
http://cpdoc.fgv.br
O site (e o centro de pesquisa) é desenvolvido pela Escola de Ciências Sociais e História da Fundação
Getúlio Vargas – FGV. Apresenta uma série de conteúdos, tais como os dossiês sobre a história do Brasil
(50 anos de Brasília, anos JK, Jango e Era Vargas);
• Biblioteca do IBGE
http://biblioteca.ibge.gov.br
A Biblioteca do Instituto Brasileiro de Geografia e Estatística – IBGE possui um vasto acervo de monografias,
mapas, publicações, fotografias, cartazes e demais conteúdos relacionados à documentação territorial do
Brasil, assim como a própria produção da instituição.
• Biblioteca Digital Paulo Freire
http://www.paulofreire.ufpb.br/paulofreire/
Projeto desenvolvido pela Universidade Federal da Paraíba, que visa divulgar a produção do educador
Paulo Freire.
• Observatório de Educação em Direitos Humanos – UNESP
http://unesp.br/observatorio_ses/index_cat3_areas.php
O observatório existe desde 2007 e está ligado à UNESP. No site é possível encontrar um farto material
sobre o tema. Em “Biblioteca”, encontram-se artigos, livros, revistas e cartilhas.

• Biblioteca Virtual Miguel de Cervantes
http://www.cervantesvirtual.com
É possível encontrar informações e acesso a conteúdos sobre literatura espanhola, língua espanhola e
literatura infantil e juvenil. Em espanhol.
• Biblioteca Virtual de Literatura
http://www.bibvirtuais.ufrj.br/literatura/
A Biblioteca Virtual de Literatura é um veículo de divulgação e informação destinado a especialistas e
pesquisadores, alunos e professores das diversas literaturas e também a leitores e usuários da rede em
geral.
Fonte e texto: Alexandre Pereira, Canal do Ensino
Tags:
• Ciências Humanas, FGV, Fundação Getúlio Vargas, BMHS, Biblioteca Mario Henrique Simonsen,
Biblioteca Digital
Biblioteca do Congresso – http://www.loc.gov
https://guia.melhoresdestinos.com.br/biblioteca-do-congresso-159-3712-l.html
Ser a biblioteca com o maior acervo do mundo já seria motivo suficiente para visitar a Congress
Library. Porém, as razões para ir até lá vão bem além dos 158 milhões de itens que ela abriga.
A história começa em 1800, quando a biblioteca foi inaugurada, no prédio do Capitólio, com
apenas 740 livros. Anos mais tarde, em 1814, a pequena biblioteca foi incendiada por tropas
britânicas que invadiram o Congresso. O novo acervo foi montado com a oferta do presidente
Thomas Jefferson. A coleção particular, com 6487 exemplares, foi cedida à biblioteca por pouco
mais de 23 mil dólares e continuou a funcionar no Capitólio. Após sofrer mais um incêndio, em
1851, somente em 1897 o prédio independente foi inaugurado, já com status de Monumento
Nacional. Foi considerado o prédio mais seguro entre as bibliotecas do mundo e o primeiro a ser
construído já com instalações elétricas.

Ainda hoje o prédio da Congress Library, em estilo renascentista, encanta e impressiona os
visitantes. Difícil não perder alguns minutos apreciando os delicados trabalhos que adornam todo
o interior. Entre tantos belos salões, o que mais chama a atenção é a sala de leitura. Com um
domo de 48 metros de altura, estátuas por toda volta e colunas grandiosas, os frequentadores
que se sentam na sala parecem até miniaturas. Do alto, mesmo atrás de uma grande parede de
vidro protetora, a vista é belíssima. As maiores estrelas do acervo são a Bíblia de Gutenberg e a
Bíblia Gigante de Mainz, acompanhadas da biblioteca original de Thomas Jefferson e o primeiro
mapa a citar a América.
A biblioteca oferece um tour guiado, mas o visitante poderá também fazer a visita por conta
própria. Veja no site a programação das exposições itinerantes apresentadas no local.
A visita é gratuita.
O tour guiado acontece entre 10h30 e 15h30, a cada uma hora.
A entrada na sala de leitura só é permitida em um tour, que pode ser o completo ou o específico
apenas para a sala.
101 Independence Ave SE, Washington – CEP: 20540
Telefone:
(202) 707-6400
Site:
http://www.loc.gov
Horário:
Segunda a sábado, de 8h30 a 16h30.

Library of Congress
(Biblioteca do Congresso)
Edifício Thomas Jefferson
Estabelecida 1800
Localização Avenida Independência, 101[1]
Colina do Capitólio, SE
Distrito de Colúmbia
Acervo
Tamanho 155,357,302 total Itens[2]
Acesso e uso
População servida Não aberta ao público
Membros 535 membros do Congresso dos Estados Unidos, seus
funcionários e membros públicos.
Outras informações
Orçamento US\$ 613.496.414[2]
Website http://www.loc.gov
Library of Congress
Registro Nacional de Lugares Históricos
Marco Histórico Nacional dos EUA
Nomeado NHL: 21 de dezembro de 1965 (55 anos)
Registro NRHP: 66000000
A Biblioteca do Congresso (em inglês: Library of Congress) é a biblioteca de pesquisa do Congresso
dos Estados Unidos, sendo de facto a biblioteca nacional dos Estados Unidos e a instituição
cultural mais antiga daquele país.

a Biblioteca do Congresso possui mais de 155 milhões de itens, entre livros,
manuscritos, jornais, revistas, mapas, vídeos e gravações de áudio, incluindo
materiais disponíveis em 470 idiomas, sendo a maior biblioteca do mundo,
tanto em espaço de armazenagem como no número de livros.
[3][4]
https://pt.wikipedia.org/wiki/Biblioteca_do_Congresso
Origem: Wikipédia, a enciclopédia livre.
MEC lança biblioteca digital infantil com materiais interativos;
acesse
A iniciativa conta também com uma série de vídeos com 20 fábulas de Monteiro
Lobato narradas pelo cantor, compositor e violonista Toquinho

• por Redação em 28 de agosto de 2020
Já está no ar a coleção Conta pra Mim: uma mini biblioteca em formato digital, com 40
cadastro. Para acessar o conteúdo, basta ir ao site do programa Conta pra Mim:
http://alfabetizacao.mec.gov.br/contapramim. Dá para ler o livro on-line, imprimir se
quiser e, ainda, baixar a versão para pintar com os pequenos.
A iniciativa, da Secretaria de Alfabetização (Sealf), do Ministério da Educação (MEC),
conta, também, com uma série de vídeos com 20 fábulas de Monteiro Lobato narradas
pelo cantor, compositor e violonista Toquinho, além de 8 cantigas infantis também
Para que o material tenha o maior alcance possível, o MEC se preocupou em preparar
uma versão acessível, com Libras e legenda, para cada um dos vídeos. E para quem tem
cantigas para que os familiares e professores possam tocar para as crianças.

Além de estarem no site do programa, todos os vídeos estão hospedados na plataforma
e com as respectivas histórias e cantigas:
Conta pra Mim – Fábulas narradas por Toquinho
Conta pra Mim (Libras) – Fábulas narradas por Toquinho (versões acessíveis)
• A Assembleia dos Ratos
• A Menina do Leite
• A Mosca e a Formiguinha
• A Raposa e as Uvas
• Burrice
• O Cão e o Lobo
• O Carreiro e o Papagaio
• O Cavalo e o Burro
• O Galo que Logrou a Raposa
• O Gato Vaidoso
• O Orgulhoso
• O Pastor e o Leão
• O Ratinho, o Gato e o Galo
• O Rato da Cidade e o Rato do Campo
• O Sabiá e o Urubu
• O Útil e o Belo
• O Velho, o Menino e a Mulinha
• Os Dois Burrinhos
Canta pra Mim – Cantigas cantadas por Toquinho
Conta pra Mim (Libras) – Cantigas narradas por Toquinho (versões acessíveis)
Canta pra Mim | Libras | Cantigas cantadas por Toquinho
• Cai, cai balão / Marcha soldado
• Teresinha de Jesus
• O Cravo brigou com a Rosa
• Alecrim
• Se esta rua fosse minha
• Meu chapéu/Meu limão, meu limoeiro
• O trem maluco
• A canoa virou

Conta pra Mim
O Conta pra Mim é o programa que orienta, promove e estimula a literacia familiar, como
a prática da leitura em voz alta feita pelos adultos às crianças, preparando-as para o ciclo
de alfabetização. Não se trata de alfabetizar precocemente as crianças, mas de promover,
por meio de atividades lúdicas, o desenvolvimento da oralidade, contatos com a escrita
etc.
Há até dicas, por exemplo, para os pais praticarem a leitura para os filhos desde quando
ainda estão na barriga da mãe. Estudos mostram que essa é uma forma de os bebês se
familiarizarem com a voz dos pais. Depois que nascem, os bebês continuam assimilando
sons e palavras e, à medida em que crescem, vão adquirindo vocabulário. Todo esse
processo trará resultados e influenciará na fase de alfabetização.
Além da Coleção Conta pra Mim, lançada na última terça-feira (25 de agosto), o programa
disponibiliza um Guia de Literacia Familiar, validado por especialistas estrangeiros, e um
conjunto de 40 vídeos explicativos, com linguagem simples e acessível sobre a prática de
literacia familiar. Todo o material está disponível na página do programa.
http://alfabetizacao.mec.gov.br/contapramim
Também há uma playlist com histórias narradas nas principais plataformas de música e
podcasts (Spotify, Deezer e SoundCloud). “O MEC também disponibilizou uma série
especial, de 17 vídeos, com dicas para os pais aproveitarem melhor o tempo com as
crianças em casa neste período de recolhimento social, que pode ser conferida no canal
do MEC no YouTube. Outros conteúdos serão disponibilizados em breve na página do
Conta pra Mim”, acrescentou o secretário de Alfabetização do MEC, Carlos Nadalim.

https://en.wikipedia.org/wiki/P-value

# p-value

In null hypothesis significance testing, the p-value[note 1] is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.[2][3] A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Reporting p-values of statistical tests is common practice in academic publications of many quantitative fields. Since the precise meaning of p-value is hard to grasp, misuse is widespread and has been a major topic in metascience.[4][5]

## Basic concepts

In statistics, every conjecture concerning the unknown probability distribution of a collection of random variables representing the observed data {\displaystyle X} in some study is called a statistical hypothesis. If we state one hypothesis only and the aim of the statistical test is to see whether this hypothesis is tenable, but not to investigate other specific hypotheses, then such a test is called a null hypothesis test.

As our statistical hypothesis will, by definition, state some property of the distribution, the null hypothesis is the default hypothesis under which that property does not exist. The null hypothesis is typically that some parameter (such as a correlation or a difference between means) in the populations of interest is zero. Note that our hypothesis might specify the probability distribution of {\displaystyle X} precisely, or it might only specify that it belongs to some class of distributions. Often, we reduce the data to a single numerical statistic, e.g., {\displaystyle T}, whose marginal probability distribution is closely connected to a main question of interest in the study.

The p-value is used in the context of null hypothesis testing in order to quantify the statistical significance of a result, the result being the observed value of the chosen statistic {\displaystyle T}[note 2] The lower the p-value is, the lower the probability of getting that result if the null hypothesis were true. A result is said to be statistically significant if it allows us to reject the null hypothesis. All other things being equal, smaller p-values are taken as stronger evidence against the null hypothesis

Loosely speaking, rejection of the null hypothesis implies that there is sufficient evidence against it.

As a particular example, if a null hypothesis states that a certain summary statistic {\displaystyle T} follows the standard normal distribution N(0,1), then the rejection of this null hypothesis could mean that (i) the mean of {\displaystyle T} is not 0, or (ii) the variance of {\displaystyle T} is not 1, or (iii) {\displaystyle T} is not normally distributed. Different tests of the same null hypothesis would be more or less sensitive to different alternatives. However, even if we do manage to reject the null hypothesis for all 3 alternatives, and even if we know the distribution is normal and variance is 1, the null hypothesis test does not tell us which non-zero values of the mean are now most plausible. The more independent observations from the same probability distribution one has, the more accurate the test will be, and the higher the precision with which one will be able to determine the mean value and show that it is not equal to zero; but this will also increase the importance of evaluating the real-world or scientific relevance of this deviation.

## Definition and interpretation

### General

Consider an observed test-statistic {\displaystyle t} from unknown distribution {\displaystyle T}. Then the p-value {\displaystyle p} is what the prior probability would be of observing a test-statistic value at least as “extreme” as {\displaystyle t} if null hypothesis {\displaystyle H_{0}} were true. That is:

• {\displaystyle p=\Pr(T\geq t\mid H_{0})} for a one-sided right-tail test,
• {\displaystyle p=\Pr(T\leq t\mid H_{0})} for a one-sided left-tail test,
• {\displaystyle p=2\min\{\Pr(T\geq t\mid H_{0}),\Pr(T\leq t\mid H_{0})\}} for a two-sided test. If distribution {\displaystyle T} is symmetric about zero, then {\displaystyle p=\Pr(|T|\geq |t|\mid H_{0})}

If the p-value is very small, then either the null hypothesis is false or something unlikely has occurred. In a formal significance test, the null hypothesis {\displaystyle H_{0}} is rejected if the p-value is less than a pre-defined threshold value {\displaystyle \alpha }, which is referred to as the alpha level or significance level. The value of {\displaystyle \alpha } is instead set by the researcher before examining the data. {\displaystyle \alpha } defines the proportion of the distribution, {\displaystyle T}, that is said to define such a narrow range of all the possible outcomes of {\displaystyle T} that if {\displaystyle t}‘s value is within that range its value is unlikely to have occurred by chance. Intuitively, this means that if {\displaystyle \alpha } is set to be 0.10, only 1/10th of the distribution of {\displaystyle T} is defined by {\displaystyle \alpha }, so if {\displaystyle t} falls within that range it is already occurring over a number of outcomes that happen a rare 1/10th of the time, thus suggesting this is unlikely to occur randomly. By convention, {\displaystyle \alpha } is commonly set to 0.05, though lower alpha levels are sometimes used. However, it is important to remember a number of factors–such variance, measurement errors, specification errors, problems of multiple comparisons, etc.–may mean that just because {\displaystyle t} falls within the range specified by {\displaystyle \alpha } that does not automatically mean a surprising value of {\displaystyle t} is actually statistically significant.

The p-value is a function of the chosen test statistic {\displaystyle T} and is therefore a random variable. If the null hypothesis fixes the probability distribution of {\displaystyle T} precisely, and if that distribution is continuous, then when the null-hypothesis is true, the p-value is uniformly distributed between 0 and 1. Thus, the p-value is not fixed. If the same test is repeated independently with fresh data (always with the same probability distribution), one will obtain a different p-value in each iteration. If the null-hypothesis is composite, or the distribution of the statistic is discrete, the probability of obtaining a p-value less than or equal to any number between 0 and 1 is less than or equal to that number, if the null-hypothesis is true. It remains the case that very small values are relatively unlikely if the null-hypothesis is true, and that a significance test at level {\displaystyle \alpha } is obtained by rejecting the null-hypothesis if the significance level is less than or equal to {\displaystyle \alpha }.

Different p-values based on independent sets of data can be combined, for instance using Fisher’s combined probability test.

### Distribution

When the null hypothesis is true, if it takes the form {\displaystyle H_{0}:\theta =\theta _{0}}, and the underlying random variable is continuous, then the probability distribution of the p-value is uniform on the interval [0,1]. By contrast, if the alternative hypothesis is true, the distribution is dependent on sample size and the true value of the parameter being studied.[6][7]

The distribution of p-values for a group of studies is sometimes called a p-curve.[8] The curve is affected by four factors: the proportion of studies that examined false null hypotheses, the power of the studies that investigated false null hypotheses, the alpha levels, and publication bias.[9] A p-curve can be used to assess the reliability of scientific literature, such as by detecting publication bias or p-hacking.[8][10]

### For composite hypothesis

In parametric hypothesis testing problems, a simple or point hypothesis refers to a hypothesis where the parameter’s value is assumed to be a single number. In contrast, in a composite hypothesis the parameter’s value is given by a set of numbers. For example, when testing the null hypothesis that a distribution is normal with a mean less than or equal to zero against the alternative that the mean is greater than zero (variance known), the null hypothesis does not specify the probability distribution of the appropriate test statistic. In the just mentioned example that would be the Z-statistic belonging to the one-sided one-sample Z-test. For each possible value of the theoretical mean, the Z-test statistic has a different probability distribution. In these circumstances (the case of a so-called composite null hypothesis) the p-value is defined by taking the least favourable null-hypothesis case, which is typically on the border between null and alternative.

This definition ensures the complementarity of p-values and alpha-levels. If we set the significance level alpha to 0.05, and only reject the null hypothesis if the p-value is less than or equal to 0.05, then our hypothesis test will indeed have significance level (maximal type 1 error rate) 0.05. As Neyman wrote: “The error that a practising statistician would consider the more important to avoid (which is a subjective judgment) is called the error of the first kind. The first demand of the mathematical theory is to deduce such test criteria as would ensure that the probability of committing an error of the first kind would equal (or approximately equal, or not exceed) a preassigned number α, such as α = 0.05 or 0.01, etc. This number is called the level of significance”; Neyman 1976, p. 161 in “The Emergence of Mathematical Statistics: A Historical Sketch with Particular Reference to the United States”,”On the History of Statistics and Probability”, ed. D.B. Owen, New York: Marcel Dekker, pp. 149-193. See also “Confusion Over Measures of Evidence (p’s) Versus Errors (a’s) in Classical Statistical Testing”, Raymond Hubbard and M. J. Bayarri, The American Statistician, August 2003, Vol. 57, No 3, 171–182 (with discussion). For a concise modern statement see Chapter 10 of “All of Statistics: A Concise Course in Statistical Inference”, Springer; 1st Corrected ed. 20 edition (September 17, 2004). Larry Wasserman.

## Usage

The p-value is widely used in statistical hypothesis testing, specifically in null hypothesis significance testing. In this method, as part of experimental design, before performing the experiment, one first chooses a model (the null hypothesis) and a threshold value for p, called the significance level of the test, traditionally 5% or 1%[11] and denoted as α. If the p-value is less than the chosen significance level (α), that suggests that the observed data is sufficiently inconsistent with the null hypothesis and that the null hypothesis may be rejected. However, that does not prove that the tested hypothesis is false. When the p-value is calculated correctly, this test guarantees that the type I error rate is at most α[further explanation needed][citation needed]. For typical analysis, using the standard α = 0.05 cutoff, the null hypothesis is rejected when p ≤ .05 and not rejected when p > .05. The p-value does not, in itself, support reasoning about the probabilities of hypotheses but is only a tool for deciding whether to reject the null hypothesis.

### Misuse

Main article: Misuse of p-values

According to the ASA, there is widespread agreement that p-values are often misused and misinterpreted.[3] One practice that has been particularly criticized is accepting the alternative hypothesis for any p-value nominally less than .05 without other supporting evidence. Although p-values are helpful in assessing how incompatible the data are with a specified statistical model, contextual factors must also be considered, such as “the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis”.[3] Another concern is that the p-value is often misunderstood as being the probability that the null hypothesis is true.[3][12]

Some statisticians have proposed abandoning p-values and focusing more on other inferential statistics,[3] such as confidence intervals,[13][14] likelihood ratios,[15][16] or Bayes factors,[17][18][19] but there is heated debate on the feasibility of these alternatives.[20][21] Others have suggested to remove fixed significance thresholds and to interpret p-values as continuous indices of the strength of evidence against the null hypothesis.[22][23] Yet others suggested to report alongside p-values the prior probability of a real effect that would be required to obtain a false positive risk (i.e. the probability that there is no real effect) below a pre-specified threshold (e.g. 5%).[24]

## Calculation

Usually, {\displaystyle T} is a test statistic, rather than any of the actual observations. A test statistic is the output of a scalar function of all the observations. This statistic provides a single number, such as the average or the correlation coefficient, that summarizes the characteristics of the data, in a way relevant to a particular inquiry. As such, the test statistic follows a distribution determined by the function used to define that test statistic and the distribution of the input observational data.

For the important case in which the data are hypothesized to be a random sample from a normal distribution, depending on the nature of the test statistic and the hypotheses of interest about its distribution, different null hypothesis tests have been developed. Some such tests are the z-test for hypotheses concerning the mean of a normal distribution with known variance, the t-test based on Student’s t-distribution of a suitable statistic for hypotheses concerning the mean of a normal distribution when the variance is unknown, the F-test based on the F-distribution of yet another statistic for hypotheses concerning the variance. For data of other nature, for instance categorical (discrete) data, test statistics might be constructed whose null hypothesis distribution is based on normal approximations to appropriate statistics obtained by invoking the central limit theorem for large samples, as in the case of Pearson’s chi-squared test.

Thus computing a p-value requires a null hypothesis, a test statistic (together with deciding whether the researcher is performing a one-tailed test or a two-tailed test), and data. Even though computing the test statistic on given data may be easy, computing the sampling distribution under the null hypothesis, and then computing its cumulative distribution function (CDF) is often a difficult problem. Today, this computation is done using statistical software, often via numeric methods (rather than exact formulae), but, in the early and mid 20th century, this was instead done via tables of values, and one interpolated or extrapolated p-values from these discrete values[citation needed]. Rather than using a table of p-values, Fisher instead inverted the CDF, publishing a list of values of the test statistic for given fixed p-values; this corresponds to computing the quantile function (inverse CDF).

## Example

Main article: Checking whether a coin is fair

As an example of a statistical test, an experiment is performed to determine whether a coin flip is fair (equal chance of landing heads or tails) or unfairly biased (one outcome being more likely than the other).

Suppose that the experimental results show the coin turning up heads 14 times out of 20 total flips. The full data {\displaystyle X} would be a sequence of twenty times the symbol “H” or “T”. The statistic on which one might focus, could be the total number {\displaystyle T} of heads. The null hypothesis is that the coin is fair, and coin tosses are independent of one another. If a right-tailed test is considered, which would be the case if one is actually interested in the possibility that the coin is biased towards falling heads, then the p-value of this result is the chance of a fair coin landing on heads at least 14 times out of 20 flips. That probability can be computed from binomial coefficients as{\displaystyle {\begin{aligned}&\operatorname {Prob} (14{\text{ heads}})+\operatorname {Prob} (15{\text{ heads}})+\cdots +\operatorname {Prob} (20{\text{ heads}})\\&={\frac {1}{2^{20}}}\left[{\binom {20}{14}}+{\binom {20}{15}}+\cdots +{\binom {20}{20}}\right]={\frac {60,\!460}{1,\!048,\!576}}\approx 0.058\end{aligned}}}

This probability is the p-value, considering only extreme results that favor heads. This is called a one-tailed test. However, one might be interested in deviations in either direction, favoring either heads or tails. The two-tailed p-value, which considers deviations favoring either heads or tails, may instead be calculated. As the binomial distribution is symmetrical for a fair coin, the two-sided p-value is simply twice the above calculated single-sided p-value: the two-sided p-value is 0.115.

In the above example:

• Null hypothesis (H0): The coin is fair, with Prob(heads) = 0.5
• Test statistic: Number of heads
• Alpha level (designated threshold of significance): 0.05
• Observation O: 14 heads out of 20 flips; and
• Two-tailed p-value of observation O given H0 = 2*min(Prob(no. of heads ≥ 14 heads), Prob(no. of heads ≤ 14 heads))= 2*min(0.058, 0.978) = 2*0.058 = 0.115.

Note that the Prob (no. of heads ≤ 14 heads) = 1 – Prob(no. of heads ≥ 14 heads) + Prob (no. of head = 14) = 1 – 0.058 + 0.036 = 0.978; however, symmetry of the binomial distribution makes it an unnecessary computation to find the smaller of the two probabilities. Here, the calculated p-value exceeds .05, meaning that the data falls within the range of what would happen 95% of the time were the coin in fact fair. Hence, the null hypothesis is not rejected at the .05 level.

However, had one more head been obtained, the resulting p-value (two-tailed) would have been 0.0414 (4.14%), in which case the null hypothesis would be rejected at the .05 level.

## History

Computations of p-values date back to the 1700s, where they were computed for the human sex ratio at birth, and used to compute statistical significance compared to the null hypothesis of equal probability of male and female births.[25] John Arbuthnot studied this question in 1710,[26][27][28][29] and examined birth records in London for each of the 82 years from 1629 to 1710. In every year, the number of males born in London exceeded the number of females. Considering more male or more female births as equally likely, the probability of the observed outcome is 1/282, or about 1 in 4,836,000,000,000,000,000,000,000; in modern terms, the p-value. This is vanishingly small, leading Arbuthnot that this was not due to chance, but to divine providence: “From whence it follows, that it is Art, not Chance, that governs.” In modern terms, he rejected the null hypothesis of equally likely male and female births at the p = 1/282 significance level. This and other work by Arbuthnot is credited as “… the first use of significance tests …”[30] the first example of reasoning about statistical significance,[31] and “… perhaps the first published report of a nonparametric test …”,[27] specifically the sign test; see details at Sign test § History.

The same question was later addressed by Pierre-Simon Laplace, who instead used a parametric test, modeling the number of male births with a binomial distribution:[32]

In the 1770s Laplace considered the statistics of almost half a million births. The statistics showed an excess of boys compared to girls. He concluded by calculation of a p-value that the excess was a real, but unexplained, effect.

The p-value was first formally introduced by Karl Pearson, in his Pearson’s chi-squared test,[33] using the chi-squared distribution and notated as capital P.[33] The p-values for the chi-squared distribution (for various values of χ2 and degrees of freedom), now notated as P, were calculated in (Elderton 1902), collected in (Pearson 1914, pp. xxxi–xxxiii, 26–28, Table XII).

The use of the p-value in statistics was popularized by Ronald Fisher,[34][full citation needed] and it plays a central role in his approach to the subject.[35] In his influential book Statistical Methods for Research Workers (1925), Fisher proposed the level p = 0.05, or a 1 in 20 chance of being exceeded by chance, as a limit for statistical significance, and applied this to a normal distribution (as a two-tailed test), thus yielding the rule of two standard deviations (on a normal distribution) for statistical significance (see 68–95–99.7 rule).[36][note 3][37]

He then computed a table of values, similar to Elderton but, importantly, reversed the roles of χ2 and p. That is, rather than computing p for different values of χ2 (and degrees of freedom n), he computed values of χ2 that yield specified p-values, specifically 0.99, 0.98, 0.95, 0,90, 0.80, 0.70, 0.50, 0.30, 0.20, 0.10, 0.05, 0.02, and 0.01.[38] That allowed computed values of χ2 to be compared against cutoffs and encouraged the use of p-values (especially 0.05, 0.02, and 0.01) as cutoffs, instead of computing and reporting p-values themselves. The same type of tables were then compiled in (Fisher & Yates 1938), which cemented the approach.[37]

As an illustration of the application of p-values to the design and interpretation of experiments, in his following book The Design of Experiments (1935), Fisher presented the lady tasting tea experiment,[39] which is the archetypal example of the p-value.

To evaluate a lady’s claim that she (Muriel Bristol) could distinguish by taste how tea is prepared (first adding the milk to the cup, then the tea, or first tea, then milk), she was sequentially presented with 8 cups: 4 prepared one way, 4 prepared the other, and asked to determine the preparation of each cup (knowing that there were 4 of each). In that case, the null hypothesis was that she had no special ability, the test was Fisher’s exact test, and the p-value was {\displaystyle 1/{\binom {8}{4}}=1/70\approx 0.014,} so Fisher was willing to reject the null hypothesis (consider the outcome highly unlikely to be due to chance) if all were classified correctly. (In the actual experiment, Bristol correctly classified all 8 cups.)

Fisher reiterated the p = 0.05 threshold and explained its rationale, stating:[40]

It is usual and convenient for experimenters to take 5 per cent as a standard level of significance, in the sense that they are prepared to ignore all results which fail to reach this standard, and, by this means, to eliminate from further discussion the greater part of the fluctuations which chance causes have introduced into their experimental results.

He also applies this threshold to the design of experiments, noting that had only 6 cups been presented (3 of each), a perfect classification would have only yielded a p-value of {\displaystyle 1/{\binom {6}{3}}=1/20=0.05,} which would not have met this level of significance.[40] Fisher also underlined the interpretation of p, as the long-run proportion of values at least as extreme as the data, assuming the null hypothesis is true.

In later editions, Fisher explicitly contrasted the use of the p-value for statistical inference in science with the Neyman–Pearson method, which he terms “Acceptance Procedures”.[41] Fisher emphasizes that while fixed levels such as 5%, 2%, and 1% are convenient, the exact p-value can be used, and the strength of evidence can and will be revised with further experimentation. In contrast, decision procedures require a clear-cut decision, yielding an irreversible action, and the procedure is based on costs of error, which, he argues, are inapplicable to scientific research.

A closely related concept is the E-value,[42] which is the expected number of times in multiple testing that one expects to obtain a test statistic at least as extreme as the one that was actually observed if one assumes that the null hypothesis is true. The E-value is the product of the number of tests and the p-value.

The q-value is the analog of the p-value with respect to the positive false discovery rate.[43] It is used in multiple hypothesis testing to maintain statistical power while minimizing the false positive rate.[44]

## Notes

1. ^ Italicisation, capitalisation and hyphenation of the term varies. For example, AMA style uses “P value”, APA style uses “p value”, and the American Statistical Association uses “p-value”.[1]
2. ^ The statistical significance of a result does not imply that the result also has real-world relevance. For instance, a medicine might have a statistically significant effect that is too small to be interesting.
3. ^ To be more specific, the p = 0.05 corresponds to about 1.96 standard deviations for a normal distribution (two-tailed test), and 2 standard deviations corresponds to about a 1 in 22 chance of being exceeded by chance, or p ≈ 0.045; Fisher notes these approximations.

## References

2. ^ Aschwanden, Christie (2015-11-24). “Not Even Scientists Can Easily Explain P-values”FiveThirtyEight. Archived from the original on 25 September 2019. Retrieved 11 October2019.
3. Jump up to:a b c d e Wasserstein, Ronald L.; Lazar, Nicole A. (7 March 2016). “The ASA’s Statement on p-Values: Context, Process, and Purpose”The American Statistician70 (2): 129–133. doi:10.1080/00031305.2016.1154108.
4. ^ Hubbard, Raymond; Lindsay, R. Murray (2008). “Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing”. Theory & Psychology18 (1): 69–88. doi:10.1177/0959354307086923.
5. ^ Ioannidis, John P. A.; et al. (January 2017). “A manifesto for reproducible science”(PDF). Nature Human Behaviour1: 0021. doi:10.1038/s41562-016-0021S2CID 6326747.
6. ^ Bhattacharya, Bhaskar; Habtzghi, DeSale (2002). “Median of the p value under the alternative hypothesis”. The American Statistician56 (3): 202–6. doi:10.1198/000313002146S2CID 33812107.
7. ^ Hung, H.M.J.; O’Neill, R.T.; Bauer, P.; Kohne, K. (1997). “The behavior of the p-value when the alternative hypothesis is true”Biometrics (Submitted manuscript). 53 (1): 11–22. doi:10.2307/2533093JSTOR 2533093PMID 9147587.
8. Jump up to:a b Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD (2015). “The extent and consequences of p-hacking in science”PLOS Biol13 (3): e1002106. doi:10.1371/journal.pbio.1002106PMC 4359000PMID 25768323.
9. ^ Lakens D (2015). “What p-hacking really looks like: a comment on Masicampo and LaLande (2012)”Q J Exp Psychol (Hove)68 (4): 829–32. doi:10.1080/17470218.2014.982664PMID 25484109.
10. ^ Simonsohn U, Nelson LD, Simmons JP (2014). “p-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results”. Perspect Psychol Sci9 (6): 666–81. doi:10.1177/1745691614553988PMID 26186117S2CID 39975518.
11. ^ Nuzzo, R. (2014). “Scientific method: Statistical errors”Nature506 (7487): 150–152. Bibcode:2014Natur.506..150Ndoi:10.1038/506150aPMID 24522584.
12. ^ Colquhoun, David (2014). “An investigation of the false discovery rate and the misinterpretation of p-values”Royal Society Open Science1 (3): 140216. arXiv:1407.5296Bibcode:2014RSOS….140216Cdoi:10.1098/rsos.140216PMC 4448847PMID 26064558.
13. ^ Lee, Dong Kyu (7 March 2017). “Alternatives to P value: confidence interval and effect size”Korean Journal of Anesthesiology69 (6): 555–562. doi:10.4097/kjae.2016.69.6.555ISSN 2005-6419PMC 5133225PMID 27924194.
14. ^ Ranstam, J. (August 2012). “Why the P-value culture is bad and confidence intervals a better alternative” (PDF). Osteoarthritis and Cartilage20 (8): 805–808. doi:10.1016/j.joca.2012.04.001PMID 22503814.
15. ^ Perneger, Thomas V. (12 May 2001). “Sifting the evidence: Likelihood ratios are alternatives to P values”BMJ: British Medical Journal322 (7295): 1184–5. doi:10.1136/bmj.322.7295.1184ISSN 0959-8138PMC 1120301PMID 11379590.
16. ^ Royall, Richard (2004). “The Likelihood Paradigm for Statistical Evidence”. The Nature of Scientific Evidence. pp. 119–152. doi:10.7208/chicago/9780226789583.003.0005ISBN 9780226789576.
17. ^ Schimmack, Ulrich (30 April 2015). “Replacing p-values with Bayes-Factors: A Miracle Cure for the Replicability Crisis in Psychological Science”Replicability-Index. Retrieved 7 March 2017.
18. ^ Marden, John I. (December 2000). “Hypothesis Testing: From p Values to Bayes Factors”. Journal of the American Statistical Association95 (452): 1316–1320. doi:10.2307/2669779JSTOR 2669779.
19. ^ Stern, Hal S. (16 February 2016). “A Test by Any Other Name: Values, Bayes Factors, and Statistical Inference”Multivariate Behavioral Research51 (1): 23–29. doi:10.1080/00273171.2015.1099032PMC 4809350PMID 26881954.
20. ^ Murtaugh, Paul A. (March 2014). “In defense of p-values”Ecology95 (3): 611–617. doi:10.1890/13-0590.1PMID 24804441.
21. ^ Aschwanden, Christie (Mar 7, 2016). “Statisticians Found One Thing They Can Agree On: It’s Time To Stop Misusing P-Values”FiveThirtyEight.
22. ^ Amrhein, Valentin; Korner-Nievergelt, Fränzi; Roth, Tobias (2017). “The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research”PeerJ5: e3544. doi:10.7717/peerj.3544PMC 5502092PMID 28698825.
23. ^ Amrhein, Valentin; Greenland, Sander (2017). “Remove, rather than redefine, statistical significance”. Nature Human Behaviour2 (1): 0224. doi:10.1038/s41562-017-0224-0PMID 30980046S2CID 46814177.
24. ^ Colquhoun D (December 2017). “p-values”Royal Society Open Science4 (12): 171085. doi:10.1098/rsos.171085PMC 5750014PMID 29308247.
25. ^ Brian, ÉricJaisson, Marie (2007). “Physico-Theology and Mathematics (1710–1794)”. The Descent of Human Sex Ratio at Birth. Springer Science & Business Media. pp. 1–25. ISBN 978-1-4020-6036-6.
26. ^ John Arbuthnot (1710). “An argument for Divine Providence, taken from the constant regularity observed in the births of both sexes” (PDF). Philosophical Transactions of the Royal Society of London27 (325–336): 186–190. doi:10.1098/rstl.1710.0011S2CID 186209819.
27. Jump up to:a b Conover, W.J. (1999), “Chapter 3.4: The Sign Test”, Practical Nonparametric Statistics(Third ed.), Wiley, pp. 157–176, ISBN 978-0-471-16068-7
28. ^ Sprent, P. (1989), Applied Nonparametric Statistical Methods (Second ed.), Chapman & Hall, ISBN 978-0-412-44980-2
29. ^ Stigler, Stephen M. (1986). The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press. pp. 225–226ISBN 978-0-67440341-3.
30. ^ Bellhouse, P. (2001), “John Arbuthnot”, in Statisticians of the Centuries by C.C. Heyde and E. Seneta, Springer, pp. 39–42, ISBN 978-0-387-95329-8
31. ^ Hald, Anders (1998), “Chapter 4. Chance or Design: Tests of Significance”, A History of Mathematical Statistics from 1750 to 1930, Wiley, p. 65
32. ^ Stigler, Stephen M. (1986). The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press. p. 134ISBN 978-0-67440341-3.
33. Jump up to:a b Pearson, Karl (1900). “On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling” (PDF). Philosophical Magazine. Series 5. 50 (302): 157–175. doi:10.1080/14786440009463897.
34. ^ Inman 2004.
35. ^ Hubbard, Raymond; Bayarri, M. J. (2003), “Confusion Over Measures of Evidence (p′s) Versus Errors (α′s) in Classical Statistical Testing”, The American Statistician57 (3): 171–178 [p. 171], doi:10.1198/0003130031856
36. ^ Fisher 1925, p. 47, Chapter III. Distributions.
37. Jump up to:a b Dallal 2012, Note 31: Why P=0.05?.
38. ^ Fisher 1925, pp. 78–79, 98, Chapter IV. Tests of Goodness of Fit, Independence and Homogeneity; with Table of χ2Table III. Table of χ2.
39. ^ Fisher 1971, II. The Principles of Experimentation, Illustrated by a Psycho-physical Experiment.
40. Jump up to:a b Fisher 1971, Section 7. The Test of Significance.
41. ^ Fisher 1971, Section 12.1 Scientific Inference and Acceptance Procedures.
42. ^ National Institutes of Health definition of E-value
43. ^ Storey, John D (2003). “The positive false discovery rate: a Bayesian interpretation and the q-value”The Annals of Statistics31 (6): 2013–2035. doi:10.1214/aos/1074290335.
44. ^ Storey, John D; Tibshirani, Robert (2003). “Statistical significance for genomewide studies”PNAS100 (16): 9440–9445. Bibcode:2003PNAS..100.9440Sdoi:10.1073/pnas.1530509100PMC 170937PMID 12883005.

### Languages

JAMASearch All

Search All  JAMA  JAMA Network Open  JAMA Cardiology  JAMA Dermatology  JAMA Forum Archive  JAMA Health Forum  JAMA Internal Medicine  JAMA Neurology  JAMA Oncology  JAMA Ophthalmology  JAMA Otolaryngology–Head & Neck Surgery  JAMA Pediatrics  JAMA Psychiatry  JAMA Surgery  Archives of Neurology & Psychiatry

MoreThis Issue Views 25,628Citations 36 79EditorialMarch 15, 2016

# The Enduring Evolution of the P Value

Demetrios N. Kyriacou, MD, PhD1,2Author AffiliationsArticle InformationJAMA. 2016;315(11):1113-1115. doi:10.1001/jama.2016.2152related articles icon RelatedArticles

Mathematics and statistical analyses contribute to the language of science and to every scientific discipline. Clinical trials and epidemiologic studies published in biomedical journals are essentially exercises in mathematical measurement.1 With the extensive contribution of statisticians to the methodological development of clinical trials and epidemiologic theory, it is not surprising that many statistical concepts have dominated scientific inferential processes, especially in research investigating biomedical cause-and-effect relations.16 For example, the comparative point estimate of a risk factor (eg, a risk ratio) is used to mathematically express the strength of the association between the presumed exposure and the outcome of interest.79 Mathematics is also used to express random variation inherent around the point estimate as a range that is termed a confidence interval.1 However, despite the greater degree of information provided by point estimates and confidence intervals, the statistic most frequently used in biomedical research for conveying association is the P value.1014

In this issue of JAMA, Chavalarias et al15 describe the evolution of P values reported in biomedical literature over the last 25 years. Based on automated text mining of more than 12 million MEDLINE abstracts and more than 800 000 abstracts and full-text articles in PubMed Central, the authors found that a greater percentage of scientific articles reported P values in the presentation of study findings over time, with the prevalence of P values in abstracts increasing from 7.3% in 1990 to 15.6% in 2014. Among the abstracts and full-text articles with P values, 96% reported at least 1 “statistically significant” result, with strong clustering of reported P values around .05 and .001. In addition, in an in-depth manual review of 796 abstracts and 99 full-text articles from articles reporting empirical data, the authors found that P values were reported in 15.7% and 55%, respectively, whereas confidence intervals were reported in only 2.3% of abstracts and were included for all reported effect sizes in only 4% of the full-text articles. The authors suggested that rather than reporting isolated P values, research articles should focus more on reporting effect sizes (eg, absolute and relative risks) and uncertainty metrics (eg, confidence intervals for the effect estimates).

To provide context for the increasing reporting of P values in the biomedical literature over the past 25 years, it is important to consider what a P value really is, some examples of its frequent misconceptions and inappropriate use, and the evidentiary application of P values based on the 3 main schools of statistical inference (ie, Fisherian, Neyman-Pearsonian, and Bayesian philosophies).10,11

The prominence of the P value in the scientific literature is attributed to Fisher, who did not invent this probability measure but did popularize its extensive use for all forms of statistical research methods starting with his seminal 1925 book, Statistical Methods for Research Workers.16 According to Fisher, the correct definition of the P value is “the probability of the observed result, plus more extreme results, if the null hypothesis were true.”13,14 Fisher’s purpose was not to use the P value as a decision-making instrument but to provide researchers with a flexible measure of statistical inference within the complex process of scientific inference. In addition, there are important assumptions associated with proper use of the P value.10,11,13,14 First, there is no relation between the causal factor being investigated and the outcome of interest (ie, the null hypothesis is true). Second, the study design and analyses providing the effect estimate, confidence intervals, and P value for the specific study project are completely free of systemic error (ie, there are no misclassification, selection, or confounding biases). Third, the appropriate statistical test is selected for the analysis (eg, the χ2 test for a comparison of proportions).

Given these assumptions, it is not difficult to see how the concept of the P value became so frequently misunderstood and misused.10,13,14,16,17 Goodman has provided a list of 12 misconceptions of the P value.14 The most common and egregious of these misconceptions, for example, is that the P value is the probability of the null hypothesis being true. Another prevalent misconception is that if the P value is greater than .05, then the null hypothesis is true and there is no association between the exposure or treatment and outcome of interest.

Within the different philosophies of statistical inference, both the Fisherian and the Neyman-Pearsonian approaches are based on the “frequentist” interpretation of probability, which specifies that an experiment is theoretically considered one of an infinite number of exactly repeated experiments that yield statistically independent results.1015,18 Frequentist methods are the basis of almost all biomedical statistical methods taught for clinical trials and epidemiologic studies. Although both the Fisherian and Neyman-Pearsonian approaches have many similarities, they have important philosophical and practical differences.

Fisher’s approach uses a calculated P value that is interpreted as evidence against the null hypothesis of a particular research finding.14 The smaller the P value, the stronger the evidence against the null hypothesis. There is no need for a predetermined level of statistical significance for the calculated P value. A null hypothesis can be rejected, but this is not necessarily based on a preset level of significance or probability of committing an error in the hypothesis test (eg, α < .05). In addition, there is no alternative hypothesis. Inference regarding the hypothesis is preferred over a mechanical decision to accept or reject a hypothesis based on a derived probability.

In contrast to Fisher, Neyman and Pearson in the 1930s formalized the hypothesis testing process with a priori assertions and declarations. For example, they added the concept of a formal alternative hypothesis that is mutually exclusive of the null hypothesis.10,11 In addition, a value is preselected to merit the rejection of the null hypothesis known as the significance level.13,14 The goal of the statistical calculations in the Neyman-Pearsonian approach is decision and not inference. By convention, the cutoff for determining statistical significance usually was selected to be a P value below .05. A calculated P value below the preselected level of significance is conclusively determined to be “statistically significant,” and the null hypothesis is rejected in favor of the alternate hypothesis. If the P value is above the level of significance, the null hypothesis is conclusively not rejected and assumed to be true.

Inevitably, this process leads to 2 potential errors. The first is rejecting the null hypothesis when it is actually true. This is known as a type I error and will occur with a frequency based on the level selected for determining significance (α). If α is selected to be .05, then a type I error will occur 5% of the time. The second potential error is accepting the null hypothesis when it is actually false. This is known as a type II error. The complement of a type II error is to reject the null hypothesis when it is truly false. This is termed the statistical power of a study and is the probability that a significance test will detect an effect that truly exists. It is also the basis for calculating sample sizes needed for clinical trials. The objective is to design an experiment to control or minimize both types of errors.10,11

The main criticism of the Neyman-Pearsonian approach is the extreme rigidity of thinking and arriving at a conclusion. The researcher must either accept or reject a proposed hypothesis and make a dichotomous scientific decision accordingly based on a predetermined accepted level of statistical significance (eg, α < .05). Making decisions with such limited flexibility is usually neither realistic nor prudent. For example, it would be unreasonable to decide that a new cancer medication was ineffective because the calculated P value from a phase 2 trial was .051 and the predetermined level of statistical significance was considered to be less than .05.

Statistical and scientific inference need not be constricted by such rigid thinking. A form of inductive inference can be used to assess causal relations with degrees of certainty characterized as spectrums of probabilities.19 This form of scientific reasoning, known as Bayesian induction, is especially useful for both statistical and scientific inferences by which effects are observed and the cause must be inferred. For example, if an investigator finds an association between a particular exposure and a specific health-related outcome, the investigator will infer the possibility of a causal relation based on the findings in conjunction with prior studies that evaluated the same possible causal effect. The degree of inference can be quantified using prior estimations of the effect estimate being evaluated.

The main advantage of Bayesian inductive reasoning is the ability to quantify the amount of certainty in terms of known or estimated conditional probabilities. Prior probabilities are transformed into posterior probabilities based on information obtained and included in Bayesian calculations. The main limitations of Bayesian method is that prior information is often unknown or not precisely quantified, making the calculation of posterior probabilities potentially inaccurate. In addition, calculating Bayes factors (a statistical measure for quantifying evidence for a hypothesis based on Bayesian calculations) as an alternative to P values requires additional computational steps.20,21 In addition, Bayesian methods are often not taught in classical statistics courses. For these reasons, Bayesian methods are not frequently used in most biomedical research analyses.22 However, scientific inferences based on using both P values and Bayesian methods are not necessarily mutually exclusive. Greenland and Poole22 have suggested incorporating P values into modern Bayesian analysis frameworks.

Fundamentally, statistical inference using P values involves mathematical attempts to facilitate the development of explanatory theory in the context of random error. However, P values provide only a particular mathematical description of a specific data set and not a comprehensive scientific explanation of cause-and-effect relationships in a target population. Each step in the biomedical scientific process should be guided by investigators and biostatisticians who understand and incorporate subject matter knowledge into the research process from prior epidemiologic studies, clinical research, basic science, and biological theory.

With the increasing use of P values in the biomedical literature as reported by Chavalarias et al, it becomes critically important to understand the true meaning of the P value, including its strengths, limitations, and most appropriate application for statistical inference. Despite more teaching of methods and statistics in clinical medicine and for investigators, the authors’ findings that such a small proportion of abstracts reported effect sizes or measures of uncertainly are disappointing. There is nothing inherently wrong when P values are correctly used and interpreted. However, the automatic application of dichotomized hypothesis testing based on prearranged levels of statistical significance should be substituted with a more complex process using effect estimates, confidence intervals, and even P values, thereby permitting scientists, statisticians, and clinicians to use their own inferential capabilities to assign scientific significance.Back to topArticle Information

Corresponding Author: Demetrios N. Kyriacou, MD, PhD, JAMA, 330 N Wabash, Chicago, IL 60611 (demetrios.kyriacou@jamanetwork.org).

ArticleGoogle Scholar16.Fisher  RA.  Statistical Methods for Research Workers. Edinburgh, United Kingdom: Oliver & Boyd; 1925.17.Stang  A, Poole  C, Kuss  O.  The ongoing tyranny of statistical significance testing in biomedical research.  Eur J Epidemiol. 2010;25(4):225-230.PubMedGoogle ScholarCrossref18.Perezgonzalez  JD.  Fisher, Neyman-Pearson or NHST? a tutorial for teaching data testing.  Front Psychol. 2015;6:223.PubMedGoogle Scholar19.Greenland  S.  Bayesian interpretation and analysis of research results.  Semin Hematol. 2008;45(3):141-149.PubMedGoogle ScholarCrossref20.Greenland  S, Poole  C.  Living with p values: resurrecting a Bayesian perspective on frequentist statistics.  Epidemiology. 2013;24(1):62-68.PubMedGoogle ScholarCrossref21.Greenland  S.  Bayesian perspectives for epidemiological research, I: foundations and basic methods.  Int J Epidemiol. 2006;35(3):765-775.PubMedGoogle ScholarCrossref22.Bland  JM, Altman  DG.  Bayesians and frequentists.  BMJ. 1998;317(7166):1151-1160.PubMedGoogle ScholarCrossref

Coronavirus Resource Center

#### Others Also Liked

1. Guidelines for reporting statistics in journals published by the American Physiological SocietyDouglas Curran-Everett et al., Journal of Applied Physiology, 2004
2. RoB 2: a revised tool for assessing risk of bias in randomised trialsJonathan A C Sterne et al., The BMJ, 2019
1. Meta-analysis on the Effect of Current Smoking on the Risk of Developing Community Acquired PneumoniaVadsala Baskaran et al., European Respiratory Journal, 2018

Go

### JAMA Network

#### BLOGS

AMA Style Insider

### Help

#### JAMA CAREER CENTER

https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1537891

Search in:  This Journal   Anywhere   Advanced search

# The American Statistician

15,727Views23CrossRef citations to date19AltmetricListenGetting to a Post “p<0.05” Era

# Before p < 0.05 to Beyond p < 0.05: Using History to Contextualize p-Values and Significance Testing

Lee Kennedy-ShafferPages 82-90 | Received 09 Mar 2018, Accepted 03 Oct 2018, Published online: 20 Mar 2019

## ABSTRACT

As statisticians and scientists consider a world beyond p < 0.05, it is important to not lose sight of how we got to this point. Although significance testing and p-values are often presented as prescriptive procedures, they came about through a process of refinement and extension to other disciplines. Ronald A. Fisher and his contemporaries formalized these methods in the early twentieth century and Fisher’s 1925 Statistical Methods for Research Workers brought the techniques to experimentalists in a variety of disciplines. Understanding how these methods arose, spread, and were argued over since then illuminates how p < 0.05 came to be a standard for scientific inference, the advantage it offered at the time, and how it was interpreted. This historical perspective can inform the work of statisticians today by encouraging thoughtful consideration of how their work, including proposed alternatives to the p-value, will be perceived and used by scientists. And it can engage students more fully and encourage critical thinking rather than rote applications of formulae. Incorporating history enables students, practitioners, and statisticians to treat the discipline as an ongoing endeavor, crafted by fallible humans, and provides a deeper understanding of the subject and its consequences for science and society.

## 1 Introduction

With new journal policies, conferences, and special issues, it is easy to view the debate around p-values and hypothesis testing as a modern invention. For many scientists whose primary connection to statistics is through these methods, the debate may seem like a challenge to the received wisdom of their profession, a rebuke to the way they have been using statistics for decades. For students learning the field, it can seem bewildering, and they might be tempted to replace one decontextualized methodology with another. Indeed, as Gerd Gigerenzer (2004, p. 589) writes, the anonymizing of the roots of the p-value and hypothesis testing has contributed to the idea that “they were given truths” and encouraged the “mindless” use of these procedures, to the point of misuse and abuse. But for those who have studied statistics, and, in particular, studied the progression of statistical theory, the debates are not a sudden attack on a completely accepted paradigm, and the statistics themselves did not arise wholly formed to be prescriptively applied. Rather, the statistics arose through the ongoing process of scientific discovery, with contributions by many along the way.

In order to properly understand the challenges that face statistics and its applications in science, medicine, and policy today, and to meet those challenges in the future, we must consider the history of the discipline and its most prominent methods. It is a history that is too poorly known, even among statisticians, but it is rich in characters, personal grudges, and academic debates. Gigerenzer (2004, pp. 587–588) laments this lack of focus on the history and controversy when he relates the story of a psychological statistics textbook author who removed all mention of Thomas Bayes, Ronald A. Fisher, Jerzy Neyman, and Egon Pearson. In a similar vein, Stephen Ziliak and Deirdre McCloskey (2008, p. 232) argue that the conscious erasure of William S. Gosset from the history contributed to the dominance of Fisher’s paradigm and reduced the prominence of competing ideas.

In Section 2, I trace the use of statistical reasoning similar to the modern p-value before 1900, demonstrating that the statistic and the use of thresholds did not arise from Karl Pearson and Fisher alone. In Section 3, I briefly describe the contributions of Pearson, Gosset, and Fisher, both covering the similarities among them and highlighting some of the debates that occurred as early as the 1920s when Fisher’s Statistical Methods for Research Workers began to put the p-value in the hands of experimenters. In Section 4, I point out some of the challenges that emerged in response to Fisher’s paradigm, focusing especially on those arising from Gosset, Neyman, and Egon Pearson, and from the Bayesian paradigm. These sections are far from comprehensive; rather, they seek to provide an overview of the history that can spur thought and encourage further research. In Section 5, I present resources that can be used for that research and as teaching tools. I also discuss how the historical debates relate to modern arguments surrounding the p-value and how that can encourage statisticians to craft a more useful and durable response to this controversy. I further describe the role this history can play in education in formal classroom settings and in research and collaboration settings.

Understanding how p-values and 0.05 came to occupy their prominent role in twentieth century statistics reminds us that these “arbitrary” thresholds came about through work to make mathematical statistics more practical and useful for experimentalists. But these efforts were never without controversy. Learning this history will help statisticians better appreciate the translational challenges of their own work by improving understanding of the fact that, since its inception, the modern field of statistics has grappled with the balance between mathematical rigor and practical use to scientists. Those pushing the boundaries of knowledge in the discipline will surely face this balance in their own work. They will have to consider, like the statisticians of the early twentieth century did, how others will use their theories.

Learning this history will help practitioners understand that no method is sacred and that all methods are products of the era in which they were born and the functions to which they have been applied. As technology, mathematics, and science develop, new methods or adjustments to old methods will be needed as the underlying assumptions no longer apply, whether in a world of early electronic computing devices or a world of big data.

Learning this history will help students access the discipline by learning of the faults, personal and professional, of those who came up with today’s commonly used statistics and help them understand statistics as a living discipline rich with ongoing debate and new understandings. Indeed, one can find many parallels between today’s debate and the controversies that arose with the development of p-values and significance testing, framing the ASA statement and subsequent discussion as another step in the ongoing evolution of the discipline of statistics.

## 2 A World Before Fisher

The p-value is generally credited to Karl Pearson’s (1900) article in his journal Biometrika; Ronald A. Fisher’s (1925Statistical Methods for Research Workers then formalized the concept and expanded its reach to experimenters (Hubbard 2016, p. 14). But statistics similar to p-values and probabilistic reasoning akin to hypothesis tests existed well before then. Both Stigler (1986) and David and Edwards (2001) point to John Arbuthnott’s (1710) “An Argument for Divine Providence” as perhaps the earliest use of probabilistic reasoning that matches that of a modern null hypothesis test. Using birth data from London, Arbuthnott (1710) notes that births of males exceeded births of females for 82 years in a row. Supposing that the probability of males exceeding females in a year is 50%, and implicitly assuming independence across the years, Arbuthnott calculates the miniscule probability of this 82-year pattern. “From whence it follows,” Arbuthnott (1710, p. 189) confidently concludes, “that it is Art, not Chance, that governs.” Any modern student who has run a test of proportions would notice the reasoning, see Arbuthnott’s calculation of a p-value of 2.07 × 10– 25, and confirm his rejection of the null hypothesis that each year has an independent probability of 50%. The mathematically-inclined physician’s goal in this endeavor was to demonstrate the work of “Divine Providence” in the sex distribution (Arbuthnott 1710, p. 186). Many statisticians would recognize the flaw in this reasoning: the lack of a clearly stated alternative hypothesis that would be logically implied by a rejection of the null hypothesis. Gigerenzer (2004, p. 588) decries this “null ritual” used by experimentalists who often fail to properly specify “alternative substantive hypotheses.”

In the nineteenth century, French mathematicians used similar methods to analyze a wide variety of data. In celestial mechanics, Pierre-Simon Laplace (1827, p. S.30) found a small value for a statistic closely related to the modern p-value and concluded that it indicated with a high likelihood that the discrepancy in the measurements was thus “not due solely to the anomalies of chance.” Stigler (1986, p. 151) notes that Laplace himself appealed to a 0.01 significance level in his work. Stigler (1986, pp. 151–153) further highlights several errors implicit in Laplace’s analysis, errors that would be familiar to students and critics of modern hypothesis testing: improper assumptions of independence and improper estimation of variance.

Not long after, Siméon-Denis Poisson used a quantity equal to one minus a modern p-value in describing patterns in the outcomes of French jury trials. Two comparisons he makes are particularly instructive. In one, he finds a p-value of 0.0897, a value not large enough for him to conclude that there has been a change in causes (Poisson 1837, p. 373). Shortly thereafter, a p-value of 0.00468 leads Poisson to believe that in that case there is a “real anomaly in the votes of juries” (Poisson 1837, pp. 376–77). Poisson’s conclusions in these two cases, nearly a century before Fisher’s work, would comport with a 0.05 (or 0.01) significance threshold, but do not specify a threshold he used. Poisson (1837, p. 375) also refused to make a causal statement from his identified associations, noting that “the calculation cannot teach us” this answer.

Antoine Augustin Cournot formulated the p-value in fairly explicit terms, noting that as a measure of the importance of some discrepancy it combines the size of the effect and the sample size (Cournot 1843, p. 196). Cournot (1843, pp. 196–197) also issues a warning about the narrow-minded use of probabilistic statements, noting that this p-value does not fully capture the importance of the effect size and “does not at all measure the chance of truth or of error pertaining to a given judgment.” With a little modernization of language, Cournot could have written principles 2, 5, and 6 of the ASA Statement (Wasserstein and Lazar 2016).

In 1885, Francis Ysidro Edgeworth provided a more formal mathematical underpinning for the significance test and gave a simple example of how to use the standard deviation (he used the “modulus,” equal to the standard deviation multiplied by the square root of two) to perform a significance test on a given parameter (Edgeworth 1885, pp. 184–185). Using a threshold of twice the “modulus,” Edgeworth (1885) constructed a test that would be equivalent to a modern two-sided α = 0.005. Stigler (1986, p. 311) notes that this “was a rather exacting test” and that Edgeworth also considered smaller differences as “worthy of notice, although he admitted the evidence was then weaker.”

The existence of these tests of significance and p-value-like quantities long before the twentieth century demonstrate that this method of inference had an alluring rationale for practitioners in a variety of fields. Their errors in interpretations and words of caution, however, presage the controversies that would follow. Throughout the twentieth century, many of the technical probability results needed for modern significance testing arose through the theory of errors, by which astronomers and other physical scientists combined measurements and discarded outliers (Gigerenzer, Swijtink, and Daston 1989, pp. 80–84). These developments allowed Pearson, Gosset, and Fisher to make key contributions that formalized, shaped, and popularized the modern form of significance tests.

## 3 R.A. Fisher: the Experimentalist Statistician

In the early twentieth century, the forerunners of modern statistics began to determine the properties of various useful distributions. Karl Pearson (1900) described the χ2 distribution and uses of the χ2 statistic, including its use in tests of independence for proportions. Pearson (1900, pp. 157–158) here denoted by P the “chances of a system of errors with as great or greater frequency than that denoted by χ.” In an example involving dice throws, Pearson (1900, pp. 167–168) finds P = 0.000016 on a null distribution of equal probability of each face appearing and claims that “it would be reasonable to conclude that dice exhibit bias towards the higher points.” The combination of this type of probabilistic reasoning and a distribution with many practical uses made the p-value more approachable and brought it more or less to its modern formulation. W. Palin Elderton built on Pearson’s work and produced tables of values for this distribution that would enable investigators to test the goodness of fit. His article, published in Biometrika in 1902, devoted roughly half of its space to these tables (Elderton 1902). Ziliak and McCloskey (2008, pp. 199–202) note that Pearson was soon teaching his students, and enforcing as a rule for authors seeking publication in Biometrika, that three probable errors, or two standard errors, represented “certain significance.”

William Sealy Gosset, the head experimental brewer at Guinness publishing under the pseudonym “Student” (1908, p. 25), found a curve “representing the frequency distribution of values of the means of such samples,” that is, samples from a normal or “not strictly normal” distribution, “when these values are measured from the mean of the population in terms of the standard deviation of the sample.” This so-called Student’s t distribution is now taught in introductory and applied statistics courses, as it forms the basis for a substantial number of inferential procedures. Gosset’s initial paper focused as much on illustrating examples of the utility of this curve as on the mathematical justification for its use, and he produced numerous tables to enable others to use it. He calculated statistics akin to the p-value and drew conclusions from extreme values of these. For one drug trial, he regarded a statistic equivalent to p = 0.0015 as “such a high probability,” it would be in practical matters “considered as a certainty” (Student 1908, p. 21). For Gosset, however, whether an effect existed or not was less important than its impact, and he saw the use of the tests more in determining the “pecuniary advantage” of one decision versus another (Ziliak and McCloskey 2008, pp. 18–19). That is, any conclusion must rest on effect size and the relative loss and gain of any potential decision; this will be a recurring theme in the debate between competing frameworks for testing discussed below.

Ronald A. Fisher, who had corresponded with Pearson and Gosset at various points, was well aware of these advances and thus of the use of significance tests. His work, especially a series of three monographs published in the 1920s and 1930s, would expand the reach of significance tests, promote their use (and the use of statistically rigorous experimental design and analysis more broadly) to researchers, and provide tables that enabled investigators to conduct such tests.

Fisher, employed at the time at Rothamsted Experimental Station, an agricultural research institution, “extended the range of tests of significance” using the theory of maximum likelihood commonly used today and conceived of tests for small sample problems (Box 1978, p. 254). In 1922, he published three key manuscripts which covered the theoretical foundations of maximum likelihood estimation and the concept of the likelihood (Fisher 1922), the use of Pearson’s χ2 distribution to calculate p-values from contingency tables (Fisher 1922), and the use of Student’s t distribution to conduct significance tests on regression coefficients (Fisher 1922). In 1925, he published the first edition of Statistical Methods for Research Workers, which sought, in his words, “to put into the hands of research workers, and especially of biologists, the means of applying statistical tests accurately to numerical data” (Fisher 1925, p. 16). The book discusses in detail the meaning and practical implications of “P”, the statistic now known as the p-value, and suggests 0.05 as a useful cutoff:

The value for which P = .05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant. Using this criterion we should be led to follow up a negative result only once in 22 trials, even if the statistics are the only guide available. Small effects would still escape notice if the data were insufficiently numerous to bring them out, but no lowering of the standard of significance would meet this difficulty (Fisher. 1925, p. 47)

This simple paragraph demonstrates the probability-based definition of the p-value that is commonly misunderstood: that it is the probability of a result as or more extreme than the observed result given that the null hypothesis is true (Greenland et al. 2016). Additionally, it makes immediately apparent why 0.05 is convenient: it is roughly equivalent to the probability of being more than two standard deviations away from the mean of a normally distributed random variable. In this way, 0.05 can be seen not as a number Fisher plucked from the sky, but as a value that resulted from the need for ease of calculation at a time before computers rendered tables and approximations largely obsolete. This particular value had the added bonus of corresponding to three “probable errors,” a measure of spread of the normal distribution used commonly in early statistics but now largely forgotten (Stigler 1986, p. 230). So a useful rule of thumb could be given to researchers on either of two scales of measuring the spread of the distribution. Later, in applying the statistic to the χ2 distribution, Fisher (1925, p. 79) remarks that “[w]e shall not often be astray if we draw a conventional line at .05, and consider that higher values of χ2 indicate a real discrepancy.”

Statistical Methods was most valuable in the hands of experimentalists due to its explanations of tests and estimation procedures, illustrative examples, and a wealth of user-friendly tables. The tables further entrenched the use of Fisher’s preferred p-value cutoffs by displaying the calculated figures so that an investigator looked up a desired probability level for the distribution and found the quantile of the statistic that corresponded to it. Among the levels presented one almost always found 0.05 and 0.01 (Fisher 1925). As Fisher’s biographer and daughter, Box (1978, p. 246), states, “[b]y this means he produced concise and convenient tabulations of the desired quantities” and presented values “that were of immediate interest to the experimenter.” It is this accessibility that made the book popular among practicing experimentalists who “had not a hive of staff humming at their desk calculators,” but it did not endear him to more rigorous mathematicians (Box 1978, pp. 242–246). And through these presentations, which Fisher (1935; Fisher and Yates 1938) continued and expanded in The Design of Experiments and the 1938 compilation of tables with Francis Yates entitled Statistical Tables for Biological, Agricultural, and Medical Research, he set the standard for the use of p-values and statistical inference in a variety of forms of research. One may note, however, that Fisher’s tables show that he did not think 0.05 was one size fits all; if 0.05 worked in every setting, there would have been only one column in each table.

The history of the tables presented in Statistical Methods is interesting in itself and further demonstrates how these values came to be presented; it also foreshadows forthcoming schisms regarding these tests. Hubbard (2004, p. 311) notes that Pearson’s Biometrika denied Fisher permission to use Elderton’s table of χ2 probabilities in his monograph. When he created his own version, according to Pearson et al. (1990, p. 52), Fisher “gave the values of χ2 for selected values of P … and thus introduced the concept of nominal levels of significance.” Because of this change from Elderton’s table to Fisher’s, for users of the table in Statistical Methods and its successors in Statistical Tables, it would be easier to compare a calculated χ2 value to a set threshold of significance rather than find the precise p-value. For tables of the t statistic, as Ziliak and McCloskey (2008, p. 229) note, “Fisher himself copyrighted again Gosset’s tables in his own name” in Statistical Methods (emphasis in original). Through this action, which left Gosset’s name out of the book except in the phrase “Student’s t”, Fisher removed Gosset from the history of his own statistic, hid his contributions, and, more importantly, hid his competing philosophy on how the statistic should be used (Ziliak and McCloskey 2008, pp. 230–232). Reprinters of the table and those who used it in applied research would encounter only Fisher’s versions and his interpretations.

Following Fisher, the use of p-values grew among experimentalists. In the United States, they were particularly encouraged by Harold Hotelling of Stanford University, who called some of the tables in Statistical Methods “indispensable for the worker with moderate-sized samples” (quoted in Ziliak and McCloskey 2008, p. 234). George Snedecor of Iowa State University played a crucial role as well, continuing to develop the methods and promoting their use in scientific fields (Hubbard 2016, p. 21). Psychologists, sociologists, political scientists, and economists all found the innovations useful (Hubbard 2016, pp. 22–27). Thus, the p-value spread not only across oceans but beyond the natural sciences to the social sciences, echoing its use by Poisson a century earlier.

The use of 0.05 as a cutoff became customary, though not all-encompassing. Fisher’s student L. H. C. Tippett (1931, p. 48), wrote in The Method of Statistics that the 0.05 threshold was “quite arbitrary” but “in common use.” Lancelot Hogben (1957, p. 495), two decades later, wrote that Fisher’s claim that the cutoff was in usual practice was “true only of those who rely on the many rule of thumb manuals expounding Fisher’s own test prescriptions.” For scientists and students today, perhaps the prominence of this admittedly arbitrary cutoff is difficult to comprehend. However, they need only consider a time before computers and compare the calculation of a p-value by hand from one of Fisher’s or Gosset’s or Pearson’s formulae to the ease by which one can determine whether a statistic meets a threshold by reference to one of Fisher’s tables. It will immediately become clear how Fisher’s standard became the gold standard. Fisher led other tables to adopt his format through his role as secretary of the Tables Committee of the British Association (Box 1978, p. 247), ensuring that future statisticians who sought to reach experimentalists would need to reconcile their methods to this framework. Thus, “p < 0.05” could grow to the prominence it holds today.

## 4 Challenges to Fisher’s View

The other piece of history often lost in the presentation of p-values is that statisticians brought many challenges to Fisher’s framework as soon as it was presented. As Fisher was writing his manuscripts, Jerzy Neyman and Egon Pearson (1933) were preparing their own framework for hypothesis testing. Rather than focusing on falsifying a null hypothesis, Neyman and Pearson presented two competing hypotheses, a null hypothesis and an alternative hypothesis, and framed testing as a means of choosing between them. The decision then must balance two types of error, one made by incorrectly rejecting the null hypothesis when it is true (Type I Error) and one made by incorrectly accepting the null hypothesis when it is false (Type II Error). More generally, one can consider the class of “admissible alternative hypotheses” of which the null hypothesis is a member (Neyman and Pearson 1933, p. 294); the goal is then to compare the null hypothesis to the alternative that imparts the highest likelihood on the observed data. They propose a class of tests that, for a given limit of Type I Error, minimize the risk of Type II Error, the so-called most powerful tests. The Type I Error risk, often called the significance level and denoted α, is commonly set at 0.05 (or 0.01), as the pair noted in their paper. The Type II Error risk, often denoted β, is equal to one minus what we now call the power of the test.

This procedure has many similarities to Fisher’s framework that uses the p-value as a continuous measure of evidence against the null hypothesis; indeed, in many cases, Fisher’s choice of test statistic corresponds to a reasonable choice of alternative hypothesis in a Neyman–Pearson most powerful test (Lehmann 1993, pp. 1243, 1246). In those cases, p < α if and only if the most powerful α-level test would reject the null hypothesis. Nonetheless, the two factions debated fiercely the merits of each version. In one sense, the controversy can be regarded as a debate over the role of the statistician and of the test itself: should the test be considered as a step along the way to deeper understanding, a piece of evidence among many to be considered in crafting and supporting a scientific theory? Or should it be considered as a guide to decision-making, a way to choose the next behavior, whether in a practical or experimental setting? Fisher’s writings generally support the former view, taking the test and the p-value as a piece of evidence in the scientific process, one that he wrote “is based on a fact communicable to, and verifiable by, other rational minds” (Fisher 1956, p. 43). For Neyman and Pearson, on the other hand, to accept a hypothesis means “to act as if it were true” and thus the hypotheses and error probabilities should be chosen in light of the consequences of making either decision (Gigerenzer, Swijtink, and Daston 1989, p. 101).

In a practical way, the Neyman–Pearson view also meant considering the reasonability of alternative hypotheses. Berkson (1938, p. 531) provided an application of this question, discussing how someone familiar with the data would only truly reject the null hypothesis “on the willingness to embrace an alternative one.” The debate took on a variety of aspects, however, including being somewhat representative of a larger controversy over the role of mathematical rigor in statistics, with Fisher assailing Neyman and Pearson as mathematicians whose work failed to reflect the nuances of scientific inference (Gigerenzer, Swijtink, and Daston 1989, p. 98). It also covered differences in the role assigned to a statistical model of data and decision-making, which in turn relate to fundamental probability questions about defining populations and samples (Lenhard 2006). All of these differences were heightened and perhaps even exaggerated by “the ferocity of the rhetoric” (Lehmann 1993, p. 1242).

While this debate raged in the halls of academic statisticians for decades (and, even today, attempts are made to clearly define the differences or reconcile the two theories), experimentalists began to follow a third way, an “anonymous hybrid consisting of the union of the ideas developed by” Fisher and Neyman–Pearson (Hubbard 2004, p. 296, emphasis in original). Often, reporting of results will include a comparison of the p-value to a threshold level (e.g., 0.05) to claim existence of an effect, reporting of the p-value itself, and relative measure of evidence terms such as “highly significant,” “marginally significant,” and “nearly significant.” This leads to what Hubbard (2004, p. 297) calls a “confusion between p’s and α’s” among applied researchers, as seen in textbooks, journal articles, and even publication manuals. This confusion undermines the rigorous Neyman-Pearson interpretation of limiting error to a prespecified level α. And the role of the value of p as a quantitative piece of ongoing scientific investigation (including using null hypotheses that are not a hypothesis of zero effect) favored by Fisher is lost to the decision-making encouraged by a statement of significance or lack thereof. Neither Fisher nor Neyman and Pearson would approve of this hybrid, though it has been institutionalized by textbooks and curricula, especially in applied settings. Its popularity owes a great deal to its simplicity and the ability of applied researchers to perform this “ritual” of testing in a more mechanized fashion (Gigerenzer 2004).

While this debate was ongoing, a revival of another paradigm of probability gained steam. Based on a crucial theorem by Thomas Bayes that was published in 1763, the “inverse probability” or Bayesian viewpoint embraced the subjectivity of statistical analysis (Weisberg 2014, sec. 10.2). With regard to testing, the Bayesian approach allows a researcher to calculate the probability of a specific hypothesis given the observed data, rather than the converse, which is what the Fisher and Neyman–Pearson approaches do. These views gained considerable traction after Leonard J. (Jimmie) Savage’s 1954 publication of The Foundations of Statistics, which also replied to anticipated objections to the paradigm. His work builds on that of Bruno de Finetti (1937) and Harold Jeffreys (1939).

Bayesian ideas were present before then, however, as Fisher (1922, pp. 325–330) included in his article on maximum likelihood a rejection of Bayesian approaches. Fisher (1922, p. 326) even notes that the works of Laplace and Poisson, discussed above, “introduced into their discussions of this subject ideas of a similar character” to inverse probability. While this article is far too short to cover the debate between the various Bayesian approaches and the frequentist approaches of Fisher, Gosset, and Neyman–Pearson, Savage’s book is a useful starting point, and a higher-level summary can be found in Weisberg (2014). Sharon McGrayne (2011) provides a very accessible overview of the Bayesian approach, its history, and the common use of Bayesian methods in practical research even while it was philosophically rejected by statisticians. These debates, too, are ongoing, with Bayesians or frequentists holding more sway in different scientific fields (Gigerenzer, Swijtink, and Daston 1989, pp. 91, 105), and Bayesian approaches are often suggested as alternatives to p-values, as discussed below.

In addition to these broad philosophical challenges, statisticians and scientists objected to Fisher’s p-value on practical grounds. Gosset wrote to Fisher and to Karl Pearson of the importance of considering effect sizes and, indeed, arranging experiments “so that the correlation should be as high as possible” (quoted in Ziliak and McCloskey 2008, p. 224). Fisher’s own co-author, Francis Yates, wrote in 1951 (p. 33) of his concern that experimenters were regarding “the execution of a test of significance as the ultimate objective.” Fisher (1956, p. 42) himself later wrote that “no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses.” His book even included a chapter entitled “Some misapprehensions about tests of significance” (Fisher 1956, p. 75). His writings on the matter, however, are sometimes contradictory and admit several interpretations (Gigerenzer, Swijtink, and Daston 1989, p. 97). Medical statistician Joseph Berkson (1942, p. 326) feared a disconnect between significance testing and “ordinary rational discourse,” especially in applying a rule to these tests without regard to “the circumstances in which it is applied” (p. 329). The International Biometric Society’s British Regional President warned in 1969 that significance tests might “exercise their own unintentional brand of tyranny over other ways of thinking” (Skellam 1969, p. 474). Psychologist William Rozeboom (1960) wrote of the failings of p-values and significance testing, including the uncritical appeals to 0.05, in 1960. Other psychologists and social scientists soon followed (Bakan 1966; Meehl 1967; Skipper et al. 1967).

These arguments, which began as soon as the paradigm-defining works were published, would all be familiar to those following the modern debates and resonate in the ASA statement (Wasserstein and Lazar 2016). Discussing and teaching the modern debate without acknowledging its historical roots does a disservice not only to those thinkers who engaged in the debate, but to the statistics profession as a whole.

## 5 History as Context to Inform the Present Debate

In this article, I have endeavored to recount only a small part of the history of p-values and significance testing, which itself forms only a small part of the history of probability and statistics. Much more can be found on these subjects. David Salsburg’s (2001The Lady Tasting Tea provides a highly accessible treatment. Stephen Stigler’s (1986The History of Statistics: The Measurement of Uncertainty Before 1900 covers the early history of the p-value and how it fit into notions of reasoning about uncertainty; Theodore Porter’s book The Rise of Statistical Thinking, 1820–1900, also published in 1986, covers the latter end of this pre-Pearson history. H. A. David and A. W. F. Edwards’s (2001) Annotated Readings in the History of Statistics highlights primary source material relevant to this history. In books published twenty-five years apart, Gigerenzer et al. (1989) and Weisberg (2014) describe the rise of the dominant modern mathematical conception of probability and how that influenced and was influenced by the rise of these statistics and of data-driven sciences. Stephen Ziliak and Deirdre McCloskey (2008) describe the rise of these statistics in the early twentieth century in great detail, focusing on reviving the forgotten role of Gosset through presentation and interpretation of his archival materials; they also describe the spread of the Fisherian paradigm in economics, psychology, and law, and the consequences of that spread. Finally, Donald MacKenzie’s (1981Statistics in Britain, 1865–1930 discusses Fisher and his immediate predecessors in detail, focusing especially on the effects of the British social context on the work of Francis Galton, Karl Pearson, and Fisher and on how eugenics shaped the statistical work of the three men and the rise of statistics in Britain.

Articles and books exploring the use of statistical inference, especially hypothesis testing, in specific fields can be informative of this history as well: Morrison and Henkel (1970) write of the controversies in the social sciences; Hubbard (2016) discusses the use of statistics in the management sciences as well as the social sciences; Hubbard (2004) and Chambers (2017-se) describe the controversies in psychology; Kadane, Fienberg, and DeGroot (1986) discuss the use of statistics in the field of law with several case studies; I (Kennedy-Shaffer 2017) cover some of this history with a focus on significance testing at the United States Food and Drug Administration. These various detailed accounts, among others, clarify the lessons that statisticians and practitioners can take from this history and provide ample material for statistics educators to incorporate this history into their formal and informal teaching.

### 5.1 Lessons for Statisticians and Practitioners

The history of the p-value and significance testing is useful for statisticians and scientists who use statistical methods today for a variety of reasons. The history helps clarify today’s debates, adding a long-term dimension to modern discussions. In this way, it illuminates the factors that drive the creation of statistical theory and methods and what enables them to catch on in the broader community. Understanding these factors will help statisticians respond to today’s debates and consider how proposed solutions to problems that have arisen will play out in the scientific community today and in the future.

First of all, the history clarifies the debates that are occurring today; in particular, many of the objections raised to p-values by modern scientists (and in Wasserstein and Lazar (2016) and the accompanying Online Discussion) were raised by contemporaries of Fisher. One particular aspect, the importance of considering effect size rather than simply statistical significance, was the crux of the difference between Fisher’s framework and Gosset’s (Ziliak and McCloskey 2008). Ziliak (2016-fn) reiterates this connection in an article in the Online Discussion, demonstrating the relevance of historical debates to today’s discussion. A thread of argument from Fisher’s earliest critics (and indeed Cournot and Edgeworth before him) to Rothman (2016) indicates that the de-emphasizing of effect size in favor of the p-value is an easy mistake to make and one that needs to be addressed. Similarly, debates have continued over the conflating of Fisher’s paradigm with the Neyman-Pearson approach, as discussed above. Lew (2016) describes how these different inferential questions have become hybridized. Discussions of power and the role of statisticians in the design of experiments arise in the commentaries by Berry (2016) and Gelman (2016). While their approaches are quite different, Fisher certainly understood that argument, writing an entire book on how to properly design experiments (Fisher 1935); Gosset, too, participated in this discussion, disagreeing with Fisher on key aspects (Ziliak and McCloskey 2008, p. 218). And the Bayesian-frequentist debate continues today, unresolved after decades of discussion. Among others, Benjamin and Berger (2016) and Chambers (2017, pp. 68–73) promote the potential use of Bayesian hypothesis testing as an alternative to p-values and significance testing.

It would be easy to be disheartened by this history. If we have been debating these ideas, raising similar arguments for a century, what hope do we have of solving them now? And, as Goodman (2016) puts it, “what will prevent us from dusting this same statement off 100 years hence, to remind the community yet again of how to do things right?” The history may provide the answer here. In particular, a closer look at how Fisher’s ideas spread and how the hybridization of the Fisher and Neyman-Pearson paradigms occurred, processes discussed here only briefly, can inform us of what makes statistical methods catch hold in the broader scientific, policymaking, and public communities. Berry (2016) notes that statisticians should not seek to “excuse ourselves by blaming nonstatisticians for their failure to understand or heed what we tell them.” But we can understand why they fail to heed us. Benjamini (2016) notes that the p-value was so successful in science because it “offers a first-line defense against being fooled by randomness.” That is, it was useful to nonmathematicians in giving them a quantitative basis for addressing uncertainty. Additionally, it has some intuitive meaning, as can be seen by the fact that methods similar to the p-value arose repeatedly in various fields even before Fisher. And it had passionate advocates who put the tools into the hands of scientists in a way that was easy to use, like through Fisher and Yates’s Statistical Tables. Finally, it was responsive to conditions of the time. These approaches addressed questions about variance and experimental design that were frequently raised at the time (Gigerenzer et al. 1989, pp. 73–74). Considering these virtues, Abelson (1997) suggests in a tongue-in-cheek piece that significance tests would be re-invented if they were banned and forgotten.

A response that gains traction outside of academic statisticians and that is durable, I argue, must meet these same criteria, summarized in Table 1. And moreover, to remain valuable, it must be able to adjust to changing conditions. For example, as many authors, including Weisberg (2014, sec. 12.3), have discussed, our computational and data-gathering capabilities have changed enormously over the last several years, to say nothing of changes since 1925. We have seen how the lack of computing power at the time rendered Fisher’s tables so valuable and thus so influential to practitioners. And the limited computer capabilities of the 1950s may have limited the ability of Bayesian methods to catch on with a wider audience (Weisberg 2014, sec. 8.4). The ease of computation is one cause of the multiplicity issues that are commonly discussed (Ioannidis 2005; Benjamini 2016). However, there is no reason to believe that computing capabilities have plateaued, and so an appropriate response would take into account not only today’s conditions, but also those likely to occur in the future. Moreover, as we have seen, statistical methods are not always used with fidelity to the original intents and assumptions, especially decades after their initial formulation. Several of the responses to p-values, as Benjamini (2016) notes, would be susceptible to misuse as well.

### Table 1 Criteria for a lasting framework for inference beyond p < 0.05

Certainly, these are high demands to make of any statistical method, or indeed of any scientific methodology at all. And the sheer variety of alternatives proposed indicate that even the statistical community has not coalesced around one. To take one example, consider the proposal to lower the significance threshold to 0.005 (Benjamin et al. 2018). Table 1 summarizes whether and how p < 0.005 addresses the criteria for a lasting framework, not to argue for or against it, but to suggest the utility of this framework in assessing responses beyond p < 0.05. This proposal has several advantages: it maintains the ease of use and familiarity that scientists prize and can be viewed as in line with the approaches of Fisher (who often wrote of different thresholds in different settings) or of Neyman and Pearson (if it represents some true cost of a Type I Error and is paired with Type II Error control). It also addresses some of the multiplicity issues that have arisen from changing conditions and the reduced computational burden. This is not even the first time it has been proposed; as discussed, Edgeworth implicitly used this threshold at times, and threshold proposals varied greatly before and even after Fisher. However, it is, as Benjamin et al. (2018) acknowledge, just as arbitrary as current thresholds and just as susceptible to misinterpretation. And the benefits in addressing multiplicity may fade as datasets get bigger and tests are run even more frequently. Little (2016) also notes that lowering the threshold fails to address the longstanding debate between statistical significance and substantive significance. But differing thresholds have worked in other fields and this proposal may have a great deal of value in certain settings. And with tables of significance thresholds no longer necessary thanks to modern computing power, it is quite easy for researchers to use different thresholds at different times. This suggests that no one method and no one response to the controversy will be sufficient.

A multitude of responses, tailored to scientific purposes and fields of study, will be much more likely to be able to address all of these needs. Indeed, one can see this as an extension of arguments made at various points by both Fisher and Neyman-Pearson that different experimenters, working in different contexts, will use different thresholds of significance or set different α and β parameters. As Fisher’s work focused on agriculture and biology, perhaps his advice still holds sway there, while other fields face different needs. Beyond just significance thresholds, different scientific questions can be approached with the variety of tools available, from Bayesian approaches to confidence intervals to machine learning, to suit their context. Such an approach, however, relies on a great deal of statistical sophistication among those who use statistical methods. Fortunately, this history can help improve statistics education and guide changes that would enhance that sophistication.

### 5.2 The Role of History in Statistics Education

The rise in popularity of statistics books aimed at general audiences, including some listed above, demonstrates the desires of many people to learn both the practical uses of the discipline and the way in which it came to be. Statistics educators broadly defined, whether course instructors, statistical collaborators, or writers of articles aimed at nonstatisticians, can benefit from this interest and use history as a teaching tool within this moment of debate in the discipline. The British mathematician John Fauvel (1991, pp. 4–5) presented a variety of reasons for incorporating history into mathematics education, including to “increase motivation for learning,” “explain the role of mathematics in society,” and “contextualise mathematical studies.” A decade later, the Taiwanese educator Po-Hong Liu expounded these ideas. He noted specifically that “[h]istory reveals the humanistic facets of mathematical knowledge” and can challenge students’ perceptions “that mathematics is fixed, rather than flexible, relative, and humanistic” (Liu 2003, p. 418).

These reasons all hold for statistics, especially as the discipline faces great change, not just in the use of conventional inferential methods but also with the rise of computing power and big data. As Goodman (2016, p. 1) notes: “that statisticians do not all accept at face value what most scientists are routinely taught as uncontroversial truisms will be a shock to many.” To meet Millar’s (2016) and Stangl’s (2016) challenge of improving statistical education, teachers and collaborators should consider the introduction of this history into their discussions of significance testing. Presenting these controversies requires educators to present other approaches and thus also serves to, as Millar (2016) suggests, “make our students aware that p-values are not the ‘only way.’”

The topics covered here can be introduced alongside the presentation of the tables of values of the normal, Student’s t, and χ2 distributions, which still hold a place as early as the Advanced Placement Statistics curriculum (AP 2010). Inviting students to consider how the lack of computers affected the development of statistics may further appreciation for these tables (or, more likely, further appreciation for the computer software that has rendered them obsolete). This in turn will help students appreciate what has changed since 1925 and how methods may need to change to reflect that.

Presenting the debate between Fisher, Gosset, Neyman-Pearson, and the Bayesians, and how that debate has evolved into the current discussion, highlights the human aspect of statisticians and the constantly changing, challenging nature of the field. As discussed above, many of the specific points made in that debate are ongoing points of contention today. In-depth analysis of Fisher’s rationale for using the 0.05 standard can highlight how, though arbitrary, it is not without context, and how it responded to the needs of experimentalists at a certain point of history. This understanding will allow students and practitioners to form their own assessment of, for example, the proposal to lower the standard to 0.005. In this way, it becomes harder to dismiss the p-value without providing a substitute that is similarly usable by those who perform statistical analyses today. This teaching will also give students and practitioners the ability to critique the next statistical method that comes along, and to consider alternatives to the p-value in the context of statistical history and the role of statistics in modern science and society.

## 6 Conclusion

As we consider a world “beyond p < 0.05,” I invite statisticians and scientists alike to consider the world before p < 0.05, a world where statistical analysis was less common and far more difficult an undertaking. It is then easier to see how p-values came to such prominence throughout science, despite the immediate disagreements among statisticians. Statistics is an evolving discipline, but it is in the difficult position of needing to evolve alongside the various disciplines that make use of its tools. In Fisher’s teaching and manuscripts, writes Box (1978, p. 242), “he aimed to give workers a chance to familiarize themselves with tools of statistical craft as he had become familiar with them, and to evolve better ways of using them.” This approach helped make statistics a fundamental tool in many disciplines, but has led to the challenges discussed in the ASA statement and elsewhere. Presenting this history as context for these discussions provides appropriate recognition of the rich debates that define statistics. It encourages statisticians to consider how their work will be used by practitioners and encourages practitioners to consider whether they are using statistical methodologies as they were intended. Through ongoing discussions and by encouraging this critical thinking, statistics can continue to be a field that helps push forward the boundaries of knowledge.

Toward Replicability With Confidence Intervals for the Exceedance ProbabilitySource: Informa UK Limited

## Acknowledgments

### Funding

The author’s studies are supported by Grant No. 5T32AI007358-28, U.S. National Institute of Allergy and Infectious Diseases, National Institutes of Health 5T32AI007358-28.

## References

• Abelson, R. P. (1997), “A Retrospective on the Significance Test Ban of 1999 (If There Were No Significance Tests, They Would Be Invented),” in What If There Were No Significance Tests?, eds. L. L. Harlow, S. A. Mulaik and J. H. Steiger, Multivariate Applications, Mahwah, NJ: Lawrence Erlbaum Associates, pp. 117–141. [Google Scholar]
• AP (2010), “Statistics Course Description,” available at https://secure-media.collegeboard.org/ap-student/course/ap-statistics-2010-course-exam-description.pdf[Google Scholar]
• Arbuthnott, J. (1710), “An Argument for Divine Providence, Taken From the Constant Regularity Observ’d in the Births of Both Sexes,” Philosophical Transactions (1683–1775), 27, 186–190. [Crossref][Google Scholar]
• Bakan, D. (1966), “The Test of Significance in Psychological Research,” Psychological Bulletin, 66, 423–437. DOI: 10.1037/h0020412. [Crossref][PubMed][Web of Science ®][Google Scholar]
• Benjamin, D. J. and Berger, J. O. (2016), “Comment: A Simple Alternative to p-values,” The American Statistician, Online Discussion 70, 1–2. [Google Scholar]
• Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., J. Wagenmakers, E., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A., Easwaran, K., Efferson, C., Fehr, E., Fidler, F., Field, A. P., Forster, M., George, E. I., Gonzalez, R., Goodman, S., Green, E., Green, D. P., Greenwald, A. G., Hadfield, J. D., Hedges, L. V., Held, L., Ho, T. H., Hoijtink, H., Hruschka, D. J., Imai, K., Imbens, G., Ioannidis, J. P. A., Jeon, M., Jones, J. H., Kirchler, M., Laibson, D., List, J., Little, R., Lupia, A., Machery, E., Maxwell, S. E., McCarthy, M., Moore, D. A., Morgan, S. L., Munafó, M., Nakagawa, S., Nyhan, B., Parker, T. H., Pericchi, L., Perugini, M., Rouder, J., Rousseau, J., Savalei, V., Schönbrodt, F. D., Sellke, T., Sinclair, B., Tingley, D., Van Zandt, T., Vazire, S., Watts, D. J., Winship, C., Wolpert, R. L., Xie, Y., Young, C., Zinman, J. and Johnson, V. E. (2018), “Redefine Statistical Significance,” Nature Human Behaviour, 2, 6–10. DOI: 10.1038/s41562-017-0189-z. [Crossref][PubMed][Web of Science ®][Google Scholar]
• Benjamini, Y. (2016), “It’s Not the p-values’ Fault,” The American Statistician, Online Discussion, 70, 1–2. [Google Scholar]
• Berkson, J. (1938), “Some Difficulties of Interpretation Encountered in the Application of the Chi-square Test,” Journal of American Statistical Association, 33, 526–536. DOI: 10.1080/01621459.1938.10502329. [Taylor & Francis Online][Google Scholar]
• Berkson, J. (1942), “Tests of Significance Considered as Evidence,” Journal of American Statistical Association, 37, 325–335. DOI: 10.1080/01621459.1942.10501760. [Taylor & Francis Online][Google Scholar]
• Berry, D. A. (2016), “p-values Are Not What They’re Cracked Up to Be,” The American Statistician, Online Discussion, 70, 1–2. [Web of Science ®][Google Scholar]
• Box, J. F. (1978), R. A. Fisher, the Life of a Scientist, Wiley Series in Probability and Mathematical Statistics, New York: Wiley. [Google Scholar]
• Chambers, C. (2017), The Seven Deadly Sins of Psychology: A Manifesto for Reforming the Culture of Scientific Practice, Princeton, NJ: Princeton University Press. [Crossref][Google Scholar]
• Cournot, A. A. (1843), Exposition de la Théorie des Chances et des Probabilités, Paris: L. Hachette. [Google Scholar]
• David, H. A. and Edwards, A. W. F. (2001), Annotated Readings in the History of Statistics, Springer Series in Statistics: Perspectives in Statistics. New York: Springer. [Google Scholar]
• De Finetti, B. (1937), “La Prévision: Ses Lois Logiques, Ses Sources Subjectives,” Annales de l’Institut Henri Poincaré, 7, 1–68. [Google Scholar]
• Edgeworth, F. Y. (1885), “Methods of Statistics,” Journal of Statistical Society, London, Jubilee Volume, 181–217. [Google Scholar]
• Elderton, W. P. (1902), “Tables for Testing the Goodness of Fit of Theory to Observation,” Biometrika, 1, 155–163. DOI: 10.2307/2331485. [Crossref][Google Scholar]
• Fauvel, J. (1991), “Using History in Mathematics Education,” For the Learning of Mathematics, 11, 3–6. [Google Scholar]
• Fisher, R. A. (1922a), “The Goodness of Fit of Regression Formulae, and the Distribution of Regression Coefficients,” Journal of Royal Statistical Society, 85, 597–612. DOI: 10.2307/2341124. [Crossref][Google Scholar]
• Fisher, R. A. (1922b), “On the Interpretation of χ2 from Contingency Tables, and the Calculation of P,” Journal of Royal Statistical Society, 85, 87–94. DOI: 10.2307/2340521. [Crossref][Google Scholar]
• Fisher, R. A. (1922c), “On the Mathematical Foundations of Theoretical Statistics,” Philosophical Transactions of the Royal Society of London, Series A, 222, 309–368. DOI: 10.1098/rsta.1922.0009. [Crossref][Google Scholar]
• Fisher, R. A. (1925), Statistical Methods for Research Workers, Edinburgh: Oliver and Boyd. [Crossref][Google Scholar]
• Fisher, R. A. (1935), The Design of Experiments, Edinburgh: Oliver and Boyd. [Google Scholar]
• Fisher, R. A. (1956), Statistical Methods and Scientific Inference, Edinburgh: Oliver and Boyd. [Google Scholar]
• Fisher, R. A. and Yates, F. (1938), Statistical Tables for Biological, Agricultural and Medical Research, Edinburgh: Oliver and Boyd. [Google Scholar]
• Gelman, A. (2016), “The Problems With p-values Are Not Just with p-values,” The American Statistician, Online Discussion, 70, 1–2. [Google Scholar]
• Gigerenzer, G. (2004), “Mindless Statistics,” Journal of Socio-Economics, 33, 587–606. DOI: 10.1016/j.socec.2004.09.033. [Crossref][Google Scholar]
• Gigerenzer, G., Swijtink, Z., and Daston, L. (1989), The Empire of Chance: How Probability Changed Science and Everyday Life, New York: Cambridge University Press. [Crossref][Google Scholar]
• Goodman, S. N. (2016), “The Next Questions: Who, What, When, Where, and Why?” The American Statistician, Online Discussion, 70, 1–2. [Google Scholar]
• Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N. and Altman, D. G. (2016), “Statistical Tests, p-values, Confidence Intervals, and Power: A Guide to Misinterpretations,” The American Statistician, Online Discussion, 70, 1–12. [Google Scholar]
• Hogben, L. T. (1957), Statistical Theory: The Relationship of Probability, Credibility, and Error, London: Allen & Unwin. [Google Scholar]
• Hubbard, R. (2004), “‘Alphabet Soup: Blurring the Distinctions Between p’s and α’s in Psychological Research,” Theory of Psychology, 14, 295–327. DOI: 10.1177/0959354304043638. [Crossref][Web of Science ®][Google Scholar]
• Hubbard, R. (2016), Corrupt Research: The Case for Reconceptualizing Empirical Management and Social Science, Thousand Oaks, CA: SAGE Publications. [Crossref][Google Scholar]
• Ioannidis, J. P. A. (2005), “Why Most Published Research Findings Are False,” PLoS Medicine, 2, e124. DOI: 10.1371/journal.pmed.0020124. [Crossref][PubMed][Web of Science ®][Google Scholar]
• Jeffreys, H. (1939), The Theory of Probability, Oxford: Oxford University Press. [Google Scholar]
• Kadane, J. B., Fienberg, S. E. and DeGroot, M. H. (1986), Statistics and the Law, Wiley Series in Probability and Mathematical Statistics, New York: Wiley. [Google Scholar]
• Kennedy-Shaffer, L. (2017), “When the Alpha is the Omega: p-values, Substantial Evidence, and the 0.05 Standard at FDA,” Food & Drug Law Journal, 72, 595–635. [PubMed][Web of Science ®][Google Scholar]
• Laplace, P. S. (1827), Traité de Mécanique Céleste, Supplément, Paris: Duprat. [Google Scholar]
• Lehmann, E. L. (1993), “The Fisher, Neyman–Pearson Theories of Testing Hypotheses: One Theory or Two?” Journal of American Statistical Association, 88, 1242–1249. DOI: 10.1080/01621459.1993.10476404. [Taylor & Francis Online][Web of Science ®][Google Scholar]
• Lenhard, J. (2006), “Models and Statistical Inference: The Controversy Between Fisher and Neyman–Pearson,” British Journal for the Philosophy of Science, 57, 69–91. DOI: 10.1093/bjps/axi152. [Crossref][Web of Science ®][Google Scholar]
• Lew, M. J. (2016), “Three Inferential Questions, Two Types of p-value,” The American Statistician, Online Discussion, 70, 1–2. [Google Scholar]
• Little, R. J. (2016), “Comment,” The American Statistician, Online Discussion, 70, 1–2. [Google Scholar]
• Liu, P.-H. (2003), “Do Teachers Need to Incorporate the History of Mathematics in Their Teaching?” The Mathematics Teacher, 96, 416–421. [Crossref][Google Scholar]
• MacKenzie, D. A. (1981), Statistics in Britain, 1865–1930; The Social Construction of Scientific Knowledge. Edinburgh: Edinburgh University Press. DOI: 10.1086/ahr/87.4.1091. [Crossref][Google Scholar]
• McGrayne, S. B. (2011), The Theory that Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of Controversy, New Haven, CT: Yale University Press. [Google Scholar]
• Meehl, P. (1967), “Theory-testing in Psychology and Physics: A Methodological Paradox,” Philosophy of Science, 34, 103–115. DOI: 10.1086/288135. [Crossref][Web of Science ®][Google Scholar]
• Millar, A. M. (2016), “ASA Statement on p-values: Some Implications for Education,” The American Statistician, Online Discussion, 70, 1–2. [Web of Science ®][Google Scholar]
• Morrison, D. E., and Henkel, R. E. (1970), The Significance Test Controversy: A Reader, Methodological Perspectives. Chicago, IL: Aldine Publishing. [Google Scholar]
• Neyman, J. and Pearson, E. S. (1933), ‘The Testing of Statistical Hypotheses in Relation to Probabilities a Priori,” Mathematical Proceedings of Cambridge Philosophical Society, 29, 492–510. DOI: 10.1017/S030500410001152X. [Crossref][Google Scholar]
• Pearson, E. S., Plackett, R. L. and Barnard, G. A. (1990), ‘Student’: A Statistical Biography of William Sealy Gosset, Oxford: Clarendon Press. [Google Scholar]
• Pearson, K. (1900), “X. On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it can be Reasonably Supposed to Have Arisen from Random Sampling,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 50, 157–175. DOI: 10.1080/14786440009463897. [Taylor & Francis Online][Google Scholar]
• Poisson, S.-D. (1837), Recherches sur la Probabilité des Jugements en Matière Criminelle et en Matière Civile: Précédées des Règles Générales du Calcul des Probabilités, Paris:Bachelier. [Google Scholar]
• Porter, T. M. (1986), The Rise of Statistical Thinking, 1820–1900, Princeton, NJ: Princeton University Press. DOI: 10.1086/ahr/93.1.116. [Crossref][Google Scholar]
• Rothman, K. J. (2016), “Disengaging From Statistical Significance,” The American Statistician, Online Discussion, 70, 1. [Google Scholar]
• Rozeboom, W. W. (1960), “The Fallacy of the Null-Hypothesis Significance Test,” Psychological Bulletin. 57, 416–428. [Crossref][PubMed][Web of Science ®][Google Scholar]
• Salsburg, D. (2001), The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century, New York: W.H. Freeman and Co. [Google Scholar]
• Savage, L. J. (1954), The Foundations of Statistics, New York: Wiley. [Google Scholar]
• Skellam, J. G. (1969), “Models, Inference, and Strategy,” Biometrics 25, 457–475. [Crossref][PubMed][Web of Science ®][Google Scholar]
• Skipper, J. K., Guenther, A. L., and Nass, G. (1967), “The Sacredness of .05: A Note Concerning the Uses of Statistical Levels of Significance in Social Science,” American Sociology, 2, 16–18. [Google Scholar]
• Stangl, D. (2016), “Comment,” The American Statistician, Online Discussion, 70, 1. [Web of Science ®][Google Scholar]
• Stigler, S. M. (1986), The History of Statistics: The Measurement of Uncertainty Before 1900, Cambridge, MA: Belknap Press of Harvard University Press. DOI: 10.1086/ahr/93.4.1019. [Crossref][Google Scholar]
• Student (1908), “The Probable Error of a Mean,” Biometrika, 6, 1–25. [Crossref][Google Scholar]
• Tippett, L. H. C. (1931), The Methods of Statistics: An Introduction Mainly for Workers in the Biological Sciences, London: Williams and Norgate. [Google Scholar]
• Wasserstein, R. L., and Lazar, N. A. (2016), “The ASA’s Statement on p-values: Context, Process, and Purpose,” The American Statistician, 70, 129–133. DOI: 10.1080/00031305.2016.1154108. [Taylor & Francis Online][Web of Science ®][Google Scholar]
• Weisberg, H. I. (2014), Willful Ignorance: The Mismeasure of Uncertainty, Hoboken, NJ: Wiley. [Crossref][Google Scholar]
• Yates, F. (1951), “The Influence of Statistical Methods for Research Workers on the Development of the Science of Statistics,” Journal of American Statistical Association, 46, 19–34. DOI: 10.2307/2280090. [Crossref][Web of Science ®][Google Scholar]
• Ziliak, S. T. (2016), “The Significance of the ASA Statement on Statistical Significance and p-values,” The American Statistician, Online Discussion, 70, 1–2. [Web of Science ®][Google Scholar]
• Ziliak, S. T. and McCloskey, D. N. (2008), The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives, Ann Arbor, MI: University of Michigan Press. [Google Scholar]

## Alternative formats

What Have We (Not) Learnt from Millions of Scientific Papers with P Values?John P. A. IoannidisThe American StatisticianPublished online: 20 Mar 2019The p-Value Requires Context, Not a ThresholdRebecca A. BetenskyThe American StatisticianPublished online: 20 Mar 2019Abandon Statistical SignificanceBlakeley B. McShane et al.The American StatisticianPublished online: 20 Mar 2019View more

### Keep up to date

Registered in England & Wales No. 3099067
5 Howick Place | London | SW1P 1WG

Accept

# Everything You Know about the P-Value is Wrong

Editor’s Note: This week, I’m excited to introduce two new tl;dr bloggers extraordinaire!

Jason Shumake works as a data scientist at UT Austin’s Institute for Mental Health Research, where he builds statistical models to predict mental health outcomes. He also develops R packages that make it easier for researchers to “practice safe stats”. Check out a couple of his recent publications:

Eimeira Padilla has worked as the investigational drug pharmacist for the Ascension – Texas Family of Hospitals since 2014. In this role, she collaborates with other clinical members in the facilitation of clinical drug research. She is also a Pharmacist Faculty/Preceptor, co-investigator, and mentor on several studies, including multiple pharmacy residency projects, and she has served on the institution’s ethical research review board.

I don’t think there’s a better or more down to earth overview of the p-value anywhere, so happy stats reading!

(Whoa, I just used ‘happy’ and ‘stats’ in the same sentence. But seriously, happy stats reading!)

I bet you didn’t know and might be disturbed to learn that most conclusions from published biomedical research are false!

For example, in the field of drug discovery, one report found that fewer than 25% of published findings could be reproduced. This has been largely blamed on a “publish or perish” culture, which offers perverse incentives to report “significant” (p < 0.05) results instead of rewarding the methodical pursuit of truth.

We can’t change research culture with a single blog post. But we can raise your awareness about some bad statistical practices that contribute to these dire replication rates. We’re going to begin with the mother of all culprits: abusing the p-value.

You may be asking yourself, how do we abuse said p-value? Well, we’re so glad you asked!

## What is P-Value?

Before we dive into p-value abuse and ways to avoid it, let’s get comfortable with a p-value definition. If we can’t agree on what a p-value is, we’re not going to get very far with this article, are we?

A p-value is the probability of obtaining a result as or more extreme than the one observed, given that the null hypothesis is true.

Let’s take a closer look at the text in bold because it’s an important part of the definition that’s easy to misunderstand—or forget about entirely.

First, notice the phrase “given that”. This tells us that the probability is conditional on an assumption. The assumption is that the null hypothesis is TRUE.

And what is the null hypothesis? Think of it as the default position that someone is trying to disprove. Usually a statement along the lines of “there’s nothing to see here,” like:

• There’s no difference between those groups.
• There’s no relationship between those variables.
• That drug has no therapeutic effect.

But the null hypothesis doesn’t have to be a negative statement. For example, if someone is trying to demonstrate that Drug A is not inferior to Drug B, then the null hypothesis would be that Drug A is inferior to Drug B. However, to keep things simple, let’s think about a specific, typical example of a null hypothesis – that a drug has no therapeutic effect.

By definition, the p-value applies to “Null-Hypothesis World”—a hypothetical world in which the drug we’re testing has no effect. Like definitely, for sure, no effect. This leads us to the first common misinterpretation of the p-value: A p-value is NOT the probability that the null hypothesis is true.

For example, if we test the effect of our drug against a placebo and obtain p = 0.01, does that mean there is only a 1% chance that our drug is ineffective (and therefore a 99% chance that it is effective)?

Why not? Remember “given that the null hypothesis is true”? The p-value can’t be the probability of the thing it takes to be certain! Here’s the key point:

### The p-value is not the likelihood of the null hypothesis given your data; it’s the likelihood ofyour data given the null hypothesis.

Now, you might expect that those two likelihoods are related, and they are. But they are not the same, and it’s not straightforward to derive one from the other. If it’s not obvious why, think about this:

What is the probability that the ground is wet given the hypothesis it’s raining?

Very high, right?

Now, what about the probability that it’s raining given the ground is wet?

Not quite the same, huh?

No doubt it’s higher than it’d be if the ground were dry, but there are many reasons besides the ground might be wet. Rain is only one of them. The same goes for your hypothesis—its likelihood depends on a lot more than one puny data set!

There is a branch of statistics, called Bayesian Statistics, that is concerned with quantifying the likelihood of a hypothesis in light of prior beliefs and new data. The traditional approach, called Frequentist Statistics, takes a more qualitative approach. The logic goes something like this: if the data are super unlikely to occur in Null-Hypothesis World, then we can infer the data do not come from Null-Hypothesis World, (i.e., the null hypothesis is wrong).

By convention, p < 0.05 has been the typical criterion for “super unlikely if the null hypothesis is true.” But, it’s important to understand that this convention—which takes us from a “p-value” to a “decision about the merits of a hypothesis”—is indirect, arbitrary, and subjective. Let’s say that again for emphasis. We make clinical decisions about the usefulness of a drug based on the convention that a p-value of < 0.05 is significant. But this convention was literally drawn out of thin air…it was an arbitrary cut off point that some people agreed on.

Note: For a similar “mind blowing” arbitrary convention that we all use, check out the background story of estimating renal function with Cockroft-Gault

We’re like 2 degrees removed from actually knowing that our drug is effective, which is why statisticians use cumbersome language like “fail to reject the null.” It’s a linguistic reminder that the p-value applies to a particular data set and its compatibility with the null hypothesis. If you collect another data set testing the same hypothesis, you’ll almost certainly get a different p-value—often times very different.

We’ll come back to this point in a bit, but for now we want to draw your attention to a very important word in the definition of the p-value.

That word is “the!” As in “given the null hypothesis is true,” as in a single hypothesis test. Not a hundred hypothesis tests. Not ten hypothesis tests. One hypothesis test.

And we also need to point out something that is not usually stated explicitly but is super important: this single hypothesis test must be specified prior to running an experiment. In the modern world, researchers often collect multiple variables and perform multiple statistical tests in the absence of a pre-specified hypothesis. Or they formulate hypotheses after looking at the data they’ve collected. As we will explain, these practices lead to p-values that are horribly biased

## Is a Smaller P-Value “More” Significant?

Based on the definition of the p-value, we cannot directly relate the magnitude of the p-value to the likelihood that a hypothesis is correct.

In a rough sense, smaller p-values could signal a more implausible null hypothesis. But there is no statistical basis for assuming, for example, that a p-value of 0.001 means the null hypothesis is 10 times less likely than if the p-value were 0.01. Unfortunately, this is exactly how many researchers interpret p-values.

The table below (largely copied from Geoff Cumming’s excellent YouTube video Dance of the P Values) summarizes this “folk wisdom,” which no sane statistician would ever endorse. Yet the “wisdom” persists, getting passed down from generation to generation of biomedical researchers like a cherished family recipe.

Somewhere along the line someone made the following blunder: if p < 0.05 is “significant”, then p < 0.01 must be “highly significant”, and p < 0.001 must be very highly significant indeed! And if p-values don’t at least approach 0.05 (according to this folk tale), we’ll take this as evidence for the null hypothesis and conclude that our hypothesis is wrong.

These rules of thumb, despite having no evident basis, infected the whole research establishment, which actually began to reward researchers for obtaining small p-values! (Publications in more prestigious journals, more grant funding, more honors and awards, etc., etc.) Consequently, new researchers quickly develop conditioned emotional responses to the p-values they see when they run a statistical analysis. These reactions can be illustrated with these scenes from Seinfeld:

﻿But are these interpretations and emotions justified? For this way of thinking about p-values to be rational, the following things must be true:

• When there definitely is an effect, p-values should be less than 0.001 most of the time, AND less than 0.01 almost all the time; p-values larger than 0.10 should be exceedingly rare
• (Why else would scientists feel comfortable concluding that p < 0.001 means there definitely is an effect and p > 0.10 means there is no effect?)
• A result with p < 0.001 should provide a more accurate estimate of the true effect than a result with p = 0.01 or p = 0.05.
• (Why else would scientists celebrate such a result?)
• If we obtain a particular p-value, say 0.01, and we repeat the exact same experiment (same population, same sample size, same methods) on a different sample, we should obtain a p-value of similar magnitude as the original, say between 0.005 and 0.05. In other words, the p-value should be very reliable across replications.
• (Why else would objective scientists get so intellectually and emotionally invested in a single statistic?)

How can we know if these 3 expectations are reasonable? Well, here’s the problem – we don’t regularly encounter evidence in our day-to-day experiences that challenge any of these beliefs.

For Points 1 and 2, when do we ever know in the real world what the true effect is? If we already knew the truth, we wouldn’t need to run an experiment! So how can we possibly evaluate whether these rules of thumb capture the truth?

And for Point 3, although replication is one of the most important principles of the scientific method, full and exact replications are surprisingly uncommon. That’s because funding agencies and research journals favor innovative studies and novel findings. Replication gets a lot of lip service, but—truth be told—if a grant proposal aimed only to copy previously published work, it would be dead-on-arrival.

What is more common is partial replication, in which some elements of a previous experimental design are retained but with some parameters modified or new ones introduced. Consequently, if these studies end up contradicting the findings of the previous study, the discrepancies will usually be attributed to “methodological differences” rather than to the unreliability of the original findings. In this way, exaggerated or false findings could persist for decades before losing credibility.

So what are we to do? Just resign ourselves to ignorance and hope this folk wisdom about interpreting p-values is right?

## How Reliable is the P-value?

It’s practically impossible in real life to repeat an experiment 100 times under the exact same conditions. And it’s completely impossible to know the true magnitude of an effect beforehand. But we don’t need real-world experiments to test our assumptions about p-values. With just a bit of computer code, we are granted god-like powers to create a virtual reality in which we get to define the truth—and create an army of virtual scientists who try to discover that truth!

You are about to see the results of one such simulation, in which we made 100 virtual scientists conduct a randomized clinical trial to test the efficacy of this fake drug from The Onion: “Made by Pfizer, Despondex is the first drug designed to treat the symptoms of excessive perkiness.”

Here is the R code that builds our virtual world. Don’t worry if you don’t know R. We’re about to explain what this does in plain English. This is just to show you that it only takes a few lines of code to run these 100 experiments!﻿

First, we define what happens when one of our virtual scientists conducts a clinical trial on Despondex (the simulate_trial function).

The Despondex treatment group is made up of 63 patients, randomly drawn from a population whose “perk” scores are normally distributed, with a mean of 0 and a standard deviation of 1

The placebo control group is also made up of 63 patients randomly drawn from a population whose perk scores are normally distributed with a standard deviation of 1, but with a different mean of 0.5.

Thus, we have created a virtual reality in which Despondex reduces perkiness by half a standard deviation. To give some real-world examples of drugs that improve patients by about 0.5 standard deviations, antihypertensives for hypertension and corticosteroids for asthma have an effect size of 0.56, and antipsychotics for schizophrenia have an effect size of 0.51.

Finally, with the replicate function, we have 100 different virtual scientists run this experiment and use a t-test to evaluate the null hypothesis (that there is no difference between Despondex and placebo).

To summarize, we have 100 independent scientists run the exact same experiment, drawing from the exact same populations, so any differences in the p-values they obtain can only be attributed to random sampling error.

So…

• How many of our virtual scientists do you think will obtain p < 0.001?
• How many will get a p-value between 0.001 and 0.01, or between 0.01 and 0.05?
• How many will find a p-value that “approaches significance”?
• How many will find a p-value greater than 0.10 (and incorrectly conclude that there is no effect)?

Below are the results reported by 5 of our virtual scientists (all of whom, for some reason, appear to be identical clones of Elaine Benes from Seinfeld).

The solid vertical line is the null hypothesis: that there is 0 difference between groups.

The dashed vertical line is the difference that we know to be true. We know this is truth because we built the simulation!

The red dot is the mean difference between groups observed by each of our Elaines, and the horizontal bar is the 95% confidence interval for that difference.

First notice how, with the exact same truth:

• One Elaine finds a difference that is “approaching significance”
• One finds a result that is “very highly significant”
• The other three Elaines get something in between

Now take a closer look at that “very highly significant” result of p < 0.001.

This is the sort of dramatic finding that might be accepted by a top-tier journal such as JAMA.

But, it’s also a HUGE exaggeration of the true effect size! When someone tries to replicate this study, they’re unlikely to see this large of an effect.

Are you starting to see how this ties into the replication crisis?

Now look at the studies that actually get the true effect size about right. They have p-values of 0.005 and 0.015.

Do those effect estimates look all that different to you? Do you really think one is categorically more significant than the other?

Let’s look at another 5 of our Elaines.

Woah! P = 0.984?! How can this be? This last Elaine had the same sample size as all the other Elaines! And we know that there really is an effect because we built the simulation! The p-value should at least be “approaching significance,” right?

Anyone running a real experiment that obtained a p-value this large would conclude that the drug was ineffective and move on to a different research question.

But, this is why we say “fail to reject the null,” folks, because the null hypothesis can be wrong—even with a p-value of 0.98!

We know at least one of our Elaines is getting a JAMA paper while one might be contemplating a career change—even though they both ran the exact same experiment! Hardly seems fair, does it?

Here’s the breakdown for all 100 Elaines:

If you add these up, 80 Elaines got p < 0.05, and 20 Elaines got p > 0.05. That, my friends, is exactly what was supposed to happen. You see, we chose a sample size of 126 (63 per group) because this is the sample size that results in 80% statistical power for a t-test when the mean group difference is 0.5 standard deviations.

## Power vs. P-value

If you’ve taken statistics, you’re no doubt familiar with the “concept” of power. But how to calculate power (let alone interpret it) is probably not your specialty. So let’s review.

Power is the probability of obtaining a p-value less than 0.05 (or whatever significance threshold you specify) given that the hypothesized effect is true. This is kind of meta, huh?

We’re talking about a probability—calculated under the assumption that the null hypothesis is false—of obtaining another probability calculated under the assumption that the null hypothesis is true! Before your head explodes, let’s break this down:

1. You have a p-value, which tells you how frequently you expect to see data as extreme as yours when there really is no effect.
2. You have a threshold, usually p < 0.05, at which you reject the null hypothesis (that there is no effect).
3. You have power, which tells you how frequently you expect to meet that threshold when there really is an effect of a certain magnitude.

Power depends on three things: the sample size, the true effect size, and the p-value threshold you choose. As any one of those three things goes up, so does power. So, the power calculation tells us that, with a sample size of 126 and a true effect of 0.5:

• We have an 80% chance of obtaining a p-value less than 0.05
• And a 20% chance of obtaining a p-value of 0.05 or greater

True to form, 80 of our virtual scientists got a p-value less than 0.05, and 20 did not.

Why did we pick 80% power? Because, just as p < 0.05 is the conventional criterion for “statistical significance,” 80% power is the conventional criterion for “adequate power.” Does that number come from some kind of sophisticated cost-benefit analysis? Not at all. We can trace its origin to an eminent statistician, Jacob Cohen, but his reasoning on this matter is entirely subjective.

In Cohen’s mind, a false positive felt about four times worse than a false negative. So if we view 5% as an acceptable risk for a false positive (i.e., the p < 0.05 criterion), then the acceptable risk for a false negative should be 4 times that, or 20%. The complement of this is, voila, 80% power.

Apparently this bit of hand-waving—an arbitrary multiplier of 4 applied to an already arbitrary criterion—was met with a chorus of “Yeah, that sounds about right!” Because it was quickly adopted as the “gold standard” for determining sample-size requirements. We, however, think this is yet another example of how human beings, including scientists, are deeply flawed when it comes to their intuitions about probability. In particular, due to some quirk of the human brain, most people will simplify a chance percentage into one of three categories:

Weirdly, if we told you there’s an 80% chance that your experiment will find a significant result, that experiment somehow sounds like a good investment of your time and resources. But if we told you that 1 out of 5 of your experiments will fail to find a significant result, for no other reason than you didn’t recruit enough study volunteers…well, you’d probably recruit more volunteers! (We don’t need to point out that an 80% chance of success is the same as a 1-in-5 chance of failure, do we?)

Most researchers have at least some vague notion that a non-significant result might be a function of “low power” rather than a flaw in their hypothesis or study design. However, in our experience, few researchers are aware of the dramatic effect that power has on the distribution of p-values. Hence, the whole “approaching significance” fallacy – the false notion that if a study is underpowered, you may not get a p-value less than 0.05, but you will get a p-value approaching 0.05 (probably less than 0.10 and almost certainly less than 0.20). If there really is an effect, of course.

Remember that simulation we created? It was just a colorful reproduction of one of the simulations reported by Halsey et al. in their 2015 Nature Methods paper, The fickle P value generates irreproducible results

The following figure is from that paper. Each one of these histograms shows the frequency of how many “virtual scientists” obtained a p-value within a given range—except they ran 1000 replications instead of 100 and simulated 4 different sample sizes, from left to right:

• 10 per group (N = 20, power = 18%)
• 30 per group (N = 60, power = 48%)
• 64 per group (N = 128, power = 80%)
• 100 per group (N = 200, power = 94%)

As you can see, the assumption that p-values will “approach” 0.05 when power is low is complete and utter bullshit

When power is close to 50%, getting a p-value greater than 0.20 is just as likely as getting a p-value between 0.05 and 0.20.

And when power is less than 20%, getting a p-value greater than 0.20 is more than twice as likely as getting a p-value between 0.05 and 0.20. Those p-values aren’t just failing to approach significance—they’re running away from it!

Only when power is very high (94%) would those “folk-interpretations” have some merit. When there definitely is an effect and the sample size is large, the vast majority of experiments will find a p-value less than 0.001, and p-values greater than 0.10 will be exceedingly rare.

However, for smaller samples (or weaker effects), believing that a p-value tells you something definitive about the likelihood of your hypothesis is incredibly foolish. And, hate to burst your bubble, but except for large, multisite clinical trials—published studies are rarely this well-powered. Much of preclinical research is a lot closer to 20% power than it is to 80% power, (much less 94% power).

So why would researchers bother running an experiment with 20% power? Even if the true effect is as large as they think, there’s only a 1-in-5 chance they will get a p-value less than 0.05?

Well, a study with 20% power is a lot easier to conduct (smaller samples = faster and cheaper). And researchers are often short on time or money (usually both). “So,” they reason, “We’ll run a small pilot study. Sure, we probably won’t find anything significant, but if we see a trend then we can use it as “preliminary data” in a grant proposal to fund a larger study. Who knows, maybe the true effect size (and our actual power) is larger than we think, and, even if it’s not, a 1-in-5 chance ain’t so bad—we might get lucky. Anyway, it doesn’t hurt to try….”

That may sound reasonable, but here’s the problem: power doesn’t just impact the odds of getting p < 0.05. It also impacts the precision of the test statistic.

The likelihood that an observed effect approximates the true effect has nothing to do with p-values; it has everything to do with power.

We got a hint of this before, when we saw in our simulation that the result with the smallest p-value, less than 0.001, was a far worse estimate of the true effect than a result with p = 0.08.

Let’s return to those simulations. But this time, instead of just showing you five replications at a time, we’re going to show you all 100, sorted from the replication with the smallest observed effect (the one that made Elaine cry) to the largest. Remember, that was a simulation with 80% power. This time we’re also going to show you what the same simulation looks like with 20% power.

The thing that probably pops out at you the most is just how much wider the confidence intervals are for 20% power vs. 80% power, but look closer because that’s not the only difference.

Notice how with 80% power, the vast majority of observed mean differences (the red dots at the center of each confidence interval) hug pretty close to the true mean difference (the vertical dashed line). Now look at what happens with 20% power. Hardly any of the observed differences get close to the true difference. It is very unlikely that an observed effect size from a low-powered pilot study will be anywhere close to the true effect size.

But wait, there’s more! The rectangles we’ve drawn at the bottom of each plot highlight the proportion of studies that substantially overestimate the true effect size. Notice how the rectangle for 20% power is a lot bigger than for 80% power? That means underpowered studies are much more likely to exaggerate effect sizes, and the magnitude of this exaggeration is much greater. With 80% power, even the most optimistic replication finds an effect size that is only about double the true effect size. With 20% power, a doubling or even tripling of the true effect size is commonplace.

But wait, there’s still more! Take a closer look at the handful of replications under 20% power that do get the true effect right (centered around the dashed vertical line). Notice how the confidence intervals for every single one of them include the solid vertical line at 0, the null hypothesis. That means they all have p-values greater than 0.05. And that means that the true mean difference will never be judged statistically significant by an underpowered study.

And now look at the replications whose confidence intervals don’t include 0, meaning their p-values will be less than 0.05. Every single one of them observes an effect size 2-3 times larger than the true effect size. That means that the only way for an underpowered study to find a statistically significant difference is to gravely overestimate the true difference

This is important, so it bears repeating. An underpowered study will NEVER produce a statistically significant result that closely estimates the true effect. The only way for an underpowered study to show statistical significance is to grossly overestimate the true effect.

## The Effect of Underpowered Studies

So, in light of this knowledge, let’s revisit the rationale for doing a small pilot study. What typically happens with such a study?

Well, assuming it has 20% power, there’s about a 2-in-5 chance that the observed effect is so small (or even opposite to the hypothesized direction) that the researchers decide not to use it in a grant proposal. They file the results away and move on to a different idea for a pilot study.

Then there’s about a 2-in-5 chance that they get the evidence of the “trend” they’re looking for (i.e. an effect that is not statistically significant but that is within 0.25 standard deviations of what they hoped to see). They write a grant proposal to collect more data so they’ll have decent statistical power. This is the best possible outcome because collecting more data is exactly what they need to do.

Then there’s a 1-in-5 chance that they will get a significant result right off the bat, even with their small, crappy sample. And here’s where things go off the rails because most researchers seem to believe that low power acts only in one direction—to make effects appear weaker than they actually are. Of course, we just showed you that the effect-size distortion cuts both ways. For every underpowered study that underestimates the true effect, there is an underpowered study that overestimates it.

But, not appreciating this, our researchers will reason that since they made it past the p < 0.05 hurdle, they’re in the clear. They’ll be very excited to have discovered a much larger effect than they’d hoped for, and they’ll convince themselves that they should have expected this large of an effect all along. Now, if the researchers publish their study, what started off as a “win” turns into a “curse.” Here’s why.

Since they have a small p-value, they—not the other 4 labs that tried this experiment—will be the first ones to get published. If it’s a really novel, sexy finding, it may even get published in a high-impact journal. That seems like an unqualified win. They made it to the finish line first, so they get the fame and glory. Problem is, the reason they got to the finish line first is because they ran a fast and cheap study and got lucky. Which means they’ve reported an effect that is grossly exaggerated. And that’s the winner’s curse.

Other researchers may be inspired to follow up with similar studies, using this original study’s results to determine sample-size requirements. They’ll plug the published effect size into a power analysis—not adjusting for the fact that the true effect is likely 2-3 times smaller.

Or, more likely (because power analysis is a tiresome chore), they’ll just mimic the same sample size that the first paper used. They’ll foolishly reason, “If a study published in a top-tier journal used a given sample size, then a replication with the same sample size will also find significant results.”

Either way, now we’ll have several labs running severely underpowered experiments. And, because the power is still stuck at 20%, only one out of five will replicate the result. (Recall how fewer than 25% of drug-discovery experiments could be reproduced? That’s very much in line with what we’d expect from this scenario.)

So now what happens?

Well, the 20% of labs that replicate the result right away will jump on the publication bandwagon. So you’ll get a few publications supporting the original reported effect size. This lends the finding even more credibility, so the 4 out of 5 labs that didn’t see an effect will assume they did something wrong. They’ll scrutinize their work for mistakes, and make a second or third attempt before they try to publish a negative result. And, if the chance of getting a significant result is 1/5 for a single study, a researcher who conducts three such studies has an almost 1/2 chance that at least one will find statistical significance. (Third time really is the charm?)

Now we start to accumulate selection bias in the published literature. Instead of reporting all the replication attempts—failures and successes alike—researchers tend to misattribute failures to experimenter error and successes to experimenter rigor. Reporting the failed replication attempts just complicates the narrative (or so the rationalization goes). So they only report the successful replication, and the underpowered studies continue.

Of course, there’s an equal chance that a lab attempts three replications, and not one of them pans out. These researchers may get on their high horse and assert that the original finding was bogus and that there is no such effect. But their paper is likely to be reviewed by peers who have observed the effect, whose research careers have become quite invested in the effect, and who are going to feel put out by this attitude. They’ll look for methodological deviances from their prior work and use them to argue that the paper should be rejected.

And so the exaggerated finding may “coast” on publication bias for a while, but eventually the failed replication attempts will get published. A new consensus will emerge that “the literature is mixed.” Some studies report strong effects, and some studies report no effects. But the human desire to make sense out of noise is strong. So researchers that believe the effect is real will look for differences between the “positive” and “negative” studies. And because, in the real world, no two studies are ever exactly the same, they’ll likely find some.

Maybe studies that observe the effect tend, by chance, to have more females than males, so maybe the effect is conditional on gender. More underpowered studies are run to test that hypothesis, and the problem mushrooms from there. Some studies will report that females show a stronger effect than males; others will report that males show a stronger effect than females. Then the researcher will look for yet another conditional variable that is capable of reversing the conditional effect of gender.

Eventually (hopefully?) someone will re-run the original study with a much larger sample and finally conclude there really is an effect. But it’s only half as large as what the original study found, and the sample size needs to be about four times larger to replicate it consistently.

(Think this is just a story we made up to scare you? Read this: Why Most Discovered True Associations Are Inflated)

## Real-World Examples of Underpowered Studies

Are you about done with simulations? Well, do we have a treat for you! Check out the below real-world example (Sattar et al., Lancet, 2010) investigating the increased risk of diabetes from statins:

Note the resemblance between this real-world meta-analysis and the Despondex simulations we just talked about. The true effect size appears to be an odds ratio of 1.09. The estimated effect size from any one trial appears to dance around this value. Some of them will show a “highly significant” effect (JUPITER & PROSPER). Some “approach significance” (HPS & ALLHAT). Several would show no effect or an effect that trends in the opposite direction. P-values would be highly variable, but note that all Confidence Intervals include the true effect.

Still not convinced? Here’s another clinical real-world example published by McCarren, Hampp, Gerhard, & Mehta in the Am J Health-Syst Pharm. 2017; 74:1262-6. This table shows a clear disagreement between all the subsamples in a large range of p-values:

I mean, just look at the first potential predictor of asthma. It’s the exact same data that you are pulling subsamples from, but you get all colors of the rainbow in terms of p-values.

But don’t take our word for it.

Reason and evidence not enough? Need an appeal to authority? We’ve got you covered!

Statisticians have been sounding the alarm about p-values for years to no avail. So in 2016 the American Statistical Association (ASA) finally took the unusual step of issuing a public proclamation. We encourage you to read their full statement, but here are the bullet points. (We hope we’ve already convinced you of these.)

• P-values can indicate how incompatible the data are with a specified statistical model.
• P-values do not measure the probability that the studied hypothesis is true or the probability that the data were produced by random chance alone.
• Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
• Proper inference requires full reporting and transparency.
• A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
• By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

### This is Only the Tip of the Iceberg…

Believe it or not, all of the problems we’ve talked about so far refer to the best-case scenario. That is, we’ve only discussed what happens when there really is an effect but, because of inadequate sample size, p-values are not reliable indicators of its significance. Combined with publication bias, this has led to an epidemic of initial findings that report exaggerated effects that cannot be replicated.

It is a repeating pattern that we are all too familiar with. An initial pilot study finds an amazing result that gets everyone excited, only to leave everyone disappointed when a larger clinical study finds a far weaker result. Now you understand why this happens and how a little more numeracy (that’s like literacy, but for probability and statistics) could help us get off this roller coaster.

But we haven’t begun to talk about what happens when there really is no effect but practices like data dredging, p-hacking, and HARKing (hypothesizing after results are known) lead researchers to misidentify pure noise as statistically significant. In a future article, we’ll explain what goes horribly wrong when researchers use the same data sets to both discover and test their hypotheses. (Spoiler: those two things need to be kept separate!). And we’ll advise you how to best evaluate research in light of all this and how you can be part of the movement to improve reproducibility in science.

Brandon DysonDecember 14, 2020previousAn Introduction to Pediatric Infectious DiseasesnextThe tl;dr Pharmacy Journal Club: Once-Weekly Insulin Icodec for Type 2 DiabetesOur Best Pharmacy Cheat Sheets – Yours FREE———————————————

Get up to speed in a hurry, save time, and get a handy (and printable) practice reference:

– Antibiotic Cheat Sheet
– HIV Cheat SheetFree cheat sheets? Heck Yes!

https://science1984.com/2018/07/

https://science1984.com/2018/08/