ScienceNews – INDEPENDENT JOURNALISM SINCE 1921 – Top 10 ways to save science from its statistical self – Null hypothesis testing should be banished, estimating effect sizes should be emphasized – By Tom Siegfried – JULY 10, 2015 AT 9:00 AM – Tom Siegfried @tom_siegfried Contributing Correspondent, Science News. Advisory Group, Knowable Magazine. Author of The Number of the Heavens, published in September 2019. Washingtonmultiversebook.com Joined April 2013 @ EDITORIAL 20 MARCH 2019 – Nature 567, 283 (2019) – It’s time to talk about ditching statistical significance – Looking beyond a much used and abused measure would make science harder, but better. ´´Statistical significance is so deeply integrated into scientific practice and evaluation that extricating it would be painful. When working out which methods to use, researchers should also focus as much as possible on actual problems. People who will duel to the death over abstract theories on the best way to use statistics often agree on results when they are presented with concrete scenarios. Researchers should seek to analyse data in multiple ways to see whether different analyses converge on the same answer.´´ @ Moving to a World Beyond “p < 0.05” @ WHAT DOES STATISTICALLY SIGNIFICANT MEAN? by Jeff Sauro, PhD | October 21, 2014 @ Não houve diferença estatística significativa. E agora? @ COMMENT 20 MARCH 2019 – Scientists rise up against statistical significance – Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects. @ YouTube Videos, Links, Images, Texts and Social Networks @ Time – Internet Society – Society – Researches – Science Facts – Graphics – World History – Technologies – Human Longevity – DNA – Cell – Statistics – People – Person – Reference – Bioestatistics – Scientific Dicoveries

The diffusion of relevant information and knowledge is essential for a country progress always!!

A difusão de relevantes informações e conhecimentos é sempre essencial para o progresso de um país!!

links-of-this-blog-part-1

links-of-my-blog-part-2

Informações relevantes relacionadas à leitura de livros e seus aspectos interligados no ambiente escolar – Rodrigo Nunes Cal – Parte 1 @ RELEVANT INFORMATION RELATED TO BOOK READING AND ITS INTERCONNECTED ASPECTS IN THE SCHOOL ENVIRONMENT – RODRIGO NUNES CAL – PART 1 

Informações relevantes relacionadas à leitura de livros e seus aspectos interligados no ambiente escolar – Rodrigo Nunes Cal – Parte 2 @ RELEVANT INFORMATION RELATED TO BOOK READING AND ITS INTERCONNECTED ASPECTS IN THE SCHOOL ENVIRONMENT – RODRIGO NUNES CAL – PART 2

  • – >Mestrado – Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and Graphics – ´´My´´ Dissertation @ #energy #time #tempo #energia #life #health #saúde #vida #people #person #pessoa #pessoas #reading #leitura #vision #visão #Innovation #internet #history #história #Graphics #Gráficos #dissertation #dissertação #mestrado #research #pesquisa #details #detalhes #thoughts #thinking #reflection #reflexão #pensamentos #importance #communication #comunicações #importância #information #knowledge #informações #conhecimento #Ciência #Science #data #dados #diffusion #difusão #countries #países #cell #DNA #Célula #RNA #substances #drugs #vaccines #TherapeuticalSubstances #efficacy #eficiência #diagnosis #prognosis #treatment #disease #UnknownDiseases #name #times #influences #longevity #age #ages #test #humans #AnimalTesting #MedicalDevices #tests #laboratories #investmens #researches #references #citations #ImpactFactor #journals


Impact_Fator-wise_Top100Science_Journals

GRUPO_AF1 – ´´My´´ Dissertation

GRUPO AFAN 1 – ´´My´´ Dissertation

GRUPO_AF2 – ´´My´´ Dissertation

GRUPO AFAN 2 – ´´My´´ Dissertation

Slides – mestrado – ´´My´´ Dissertation

CARCINÓGENO DMBA EM MODELOS EXPERIMENTAIS

DMBA CARCINOGEN IN EXPERIMENTAL MODELS

Avaliação da influência da atividade física aeróbia e anaeróbia na progressão do câncer de pulmão experimental – Summary – Resumo – ´´My´´ Dissertation

Do the downloads !!! Share!! Thanks!!

´´The world people need to have very efficient researches and projects resulting in very innovative drugs, vaccines, therapeutical substances, medical devices and other technologies according to the age, the genetics and medical records of the person. So, the treatment, disgnosis and prognosis will be very efficient and better, of course´´. Rodrigo Nunes Cal

https://science1984.wordpress.com/2021/08/14/do-the-downloads-of-very-important-detailed-and-innovative-data-of-the-world-about-my-dissertation-like-the-graphics-i-did-about-the-variations-of-weights-of-all-mice-control/

Mestrado – Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and Graphics


Impact_Fator-wise_Top100Science_Journals

GRUPO_AF1

GRUPO_AF2

GRUPO AFAN 1

GRUPO AFAN 2

Slides – mestrado

CARCINÓGENO DMBA EM MODELOS EXPERIMENTAIS

Avaliação da influência da atividade física aeróbia e anaeróbia na progressão do câncer de pulmão experimental – Summary – ResumoMestrado – Dissertation – Tabelas, Figuras e Gráficos – Tables, Figures and GraphicsBaixarRedefine Statistical SignificanceBaixar

´´We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries.´´ https://www.nature.com/articles/s41562-017-0189-z Published:  Daniel J. Benjamin, James O. Berger, […]Valen E. Johnson Nature Human Behaviour volume 2, pages6–10 (2018)

Um mundo além de p < 0,05 « Sandra Merlo – Fonoaudiologia da Fluência

https://en.wikipedia.org/wiki/American_Statistical_Association

RoyalStatSoc – https://www.youtube.com/channel/UC83oOOF9lg-g1XMT_UK1tUw

http://www.sciencenoise.org/multiverse-book.html

Tom Siegfried

@tom_siegfried

Contributing Correspondent, Science News. Advisory Group, Knowable Magazine. Author of The Number of the Heavens, published in September 2019.Washingtonmultiversebook.comJoined April 2013

By Tom Siegfried

JULY 10, 2015 AT 9:00 AM

https://www.sciencenews.org/author/tom-siegfried

Skip to content

Science News Needs You

Support nonprofit journalism.SUBSCRIBE NOW

SearchOpen searchClose search

Use left and right arrow keys to navigate between menu items.Use right arrow key to move into submenus.Use escape to exit the menu.Use up and down arrow keys to explore.Use left arrow key to move back to the parent list.Science NewsINDEPENDENT JOURNALISM SINCE 1921

USE LEFT AND RIGHT ARROW KEYS TO NAVIGATE BETWEEN MENU ITEMS.USE RIGHT ARROW KEY TO MOVE INTO SUBMENUS.USE ESCAPE TO EXIT THE MENU.USE UP AND DOWN ARROW KEYS TO EXPLORE.USE LEFT ARROW KEY TO MOVE BACK TO THE PARENT LIST.

Tom Siegfried

Tom Siegfried

Contributing Correspondent

Tom Siegfried is a contributing correspondent. He was editor in chief of Science News from 2007 to 2012and he was the managing editor from 2014 to 2017. He is the author of the blog Context. In addition to Science News, his work has appeared in Science, NatureAstronomy, New Scientist and Smithsonian. Previously he was the science editor of The Dallas Morning News. He is the author of four books: The Bit and the Pendulum (Wiley, 2000); Strange Matters (National Academy of Sciences’ Joseph Henry Press, 2002);  A Beautiful Math (2006, Joseph Henry Press); and The Number of the Heavens (Harvard University Press, 2019). Tom was born in Lakewood, Ohio, and grew up in nearby Avon. He earned an undergraduate degree from Texas Christian University with majors in journalism, chemistry and history, and has a master of arts with a major in journalism and a minor in physics from the University of Texas at Austin. His awards include the American Geophysical Union’s Robert C. Cowen Award for Sustained Achievement in Science Journalism, the Science-in Society award from the National Association of Science Writers, the American Association for the Advancement of Science-Westinghouse Award, the American Chemical Society’s James T. Grady-James H. Stack Award for Interpreting Chemistry for the Public, and the American Institute of Physics Science Communication Award.

Trustworthy journalism comes at a price.

Scientists and journalists share a core belief in questioning, observing and verifying to reach the truth. Science News reports on crucial research and discovery across science disciplines. We need your financial support to make it happen – every contribution makes a difference.SUBSCRIBE OR DONATE NOW

All Stories by Tom Siegfried

  1. SCIENCE & SOCIETYWe’ve covered science for 100 years. Here’s how it has — and hasn’t — changedToday’s researchers pursue knowledge with more detail and sophistication, but some of the questions remain the same.April 2, 2021
  2. COSMOLOGYPhysicists’ devotion to symmetry has led them astray beforeIf dark matter WIMPs are mythical, they join the ancient idea that the planets moved in circles.March 31, 2021
  3. COSMOLOGYThe dark matter mystery deepens with the demise of a reported detectionEarly results from an experiment designed to replicate one that hinted that dark matter is made up of WIMPs came up empty-handed.March 24, 2021
  4. SCIENCE & SOCIETYTop 10 science anniversaries to celebrate in 2021DNA, Maxwell’s demon and Dolly the Sheep all make the list. But the one we’re most excited about at Science News is our centennial.February 3, 2021
  5. PHYSICS‘Fundamentals’ shows how reality is built from a few basic ingredientsIn ‘Fundamentals,’ physics Nobel laureate Frank Wilczek shares essential lessons from physics.January 26, 2021
  6. SCIENCE & SOCIETY‘The Light Ages’ illuminates the science of the so-called Dark AgesIn telling the story of a monk who contributed to astronomy, a new book shows that science didn’t take a break during the Middle Ages.January 8, 2021
  7. SPACETop 10 questions I’d ask an alien from the Galactic FederationAn interview with E.T. would be a journalist’s dream, but it’s not very likely.December 9, 2020
  8. SCIENCE & SOCIETYThese are science’s Top 10 erroneous resultsA weird form of life, a weird form of water and faster-than-light neutrinos are among the science findings that have not survived closer scrutiny.November 10, 2020
  9. SPACEHope for life on Venus survives for centuries against all oddsEarly scientists often assumed that Venus, though hotter than Earth, hosted life.September 25, 2020
  10. SCIENCE & SOCIETYA new Galileo biography draws parallels to today’s science denialism‘Galileo and the Science Deniers’ delivers a fresh assessment of the life of a scientific legend and offers lessons for today.August 11, 2020
  11. PHYSICSHow understanding nature made the atomic bomb inevitableOn the anniversary of Hiroshima, here’s a look back at the chain reaction of basic discoveries that led to nuclear weapons.August 6, 2020
  12. SPACESelf-destructive civilizations may doom our search for alien intelligenceA lack of signals from space may also be bad news for Earthlings.July 6, 2020

12321Next

Science News

Science News was founded in 1921 as an independent, nonprofit source of accurate information on the latest news of science, medicine and technology. Today, our mission remains the same: to empower people to evaluate the news and the world around them. It is published by the Society for Science, a nonprofit 501(c)(3) membership organization dedicated to public engagement in scientific research and education.

© Society for Science & the Public 2000–2021. All rights reserved.1719 N Street, N.W., Washington, D.C. 20036 202.785.2255Terms of ServicePrivacy PolicyUse the Shift key with the Tab key to tab back to the search input.Use up and down arrow keys to explore.Use right arrow key to move into the list.Use left arrow key to move back to the parent list.Use tab key to enter the current list item.Use escape to exit the menu.

Skip to content

Science News Needs You

Support nonprofit journalism.SUBSCRIBE NOW

SearchOpen searchClose search

Use left and right arrow keys to navigate between menu items.Use right arrow key to move into submenus.Use escape to exit the menu.Use up and down arrow keys to explore.Use left arrow key to move back to the parent list.Science NewsINDEPENDENT JOURNALISM SINCE 1921

USE LEFT AND RIGHT ARROW KEYS TO NAVIGATE BETWEEN MENU ITEMS.USE RIGHT ARROW KEY TO MOVE INTO SUBMENUS.USE ESCAPE TO EXIT THE MENU.USE UP AND DOWN ARROW KEYS TO EXPLORE.USE LEFT ARROW KEY TO MOVE BACK TO THE PARENT LIST.CONTEXTMATH

Top 10 ways to save science from its statistical self

Null hypothesis testing should be banished, estimating effect sizes should be emphasized

graph of p value
WORTHLESS  A P value is the probability of recording a result as large or more extreme than the observed data if there is in fact no real effect. P values are not a reliable measure of evidence.S. GOODMAN, ADAPTED BY A. NANDY

Share this:

By Tom Siegfried

JULY 10, 2015 AT 9:00 AM

Second of two parts (read part 1)

Statistics is to science as steroids are to baseball. Addictive poison. But at least baseball has attempted to remedy the problem. Science remains mostly in denial.

True, not all uses of statistics in science are evil, just as steroids are sometimes appropriate medicines. But one particular use of statistics — testing null hypotheses — deserves the same fate with science as Pete Rose got with baseball. Banishment.

Numerous experts have identified statistical testing of null hypotheses — the staple of scientific methodology — as a prime culprit in rendering many research findings irreproducible and, perhaps more often than not, erroneous. Many factors contribute to this abysmal situation. In the life sciences, for instance, problems with biological agents and reference materials are a major source of irreproducible results, a new report in PLOS Biology shows. But troubles with “data analysis and reporting” are also cited. As statistician Victoria Stodden recently documented, a variety of statistical issues lead to irreproducibility. And many of those issues center on null hypothesis testing. Rather than furthering scientific knowledge, null hypothesis testing virtually guarantees frequent faulty conclusions.

Sign Up For the Latest from Science News

Headlines and summaries of the latest Science News articles, delivered to your inboxE-mail*GO

“For more than half a century, distinguished scholars have published damning critiques of null hypothesis significance testing and have described the damage it does,” psychologist Geoff Cumming wrote last year in Psychological Science. “Very few defenses of null hypothesis significance testing have been attempted; it simply persists.”

A null hypothesis assumes that a factor being tested produces no effect (or an effect no different from some other factor). If experimental data are sufficiently unlikely (given the no-effect assumption), scientists reject the null hypothesis and infer that there is an effect. They call such a result “statistically significant.”

Statistical significance has nothing to do with actual significance, though. A statistically significant effect can be trivially small. Or even completely illusory.

Cumming, of the Statistical Cognition Laboratory at La Trobe University in Victoria, Australia, advocates a “new statistics” that dumps null hypothesis testing in favor of saner methods. “We need to make substantial changes to how we usually carry out research,” he declares. In his Psychological Science paper (as well as in a 2012 book), he describes an approach that focuses on three priorities: estimating the magnitude of an effect, quantifying the precision of that estimate, and combining results from multiple studies to make those estimates more precise and reliable.

Implementing Cumming’s new statistics will require more than just blog posts lamenting the problems. Someone needs to promote the recommendations that he and other worried researchers have identified. As in a Top 10 list. And since the situation is urgent, appended right here and now are my Top 10 ways to save science from its statistical self:

10. Ban P values

Statistical significance tests are commonly calculated using P values (the probability of observing the measured data — or data even more extreme — if the null hypothesis is correct). It has been repeatedly demonstrated that P values are essentially worthless as a measure of evidence. For one thing, different samples from the same dataset can yield dramatically different P values. “Any calculated value of P could easily have been very different had we merely taken a different sample, and therefore we should not trust any P value,” writes Cumming.

On occasion some journal editors have banned P values, most recently in Basic and Applied Social Psychology. Ideally there would be an act of Congress (and a United Nations resolution) condemning P values and relegating them to the same category as chemical weapons and smoking in airplanes.

9. Emphasize estimation

A major flaw with yes-or-no null hypothesis tests is that the right answer is almost always “no.” In other words, a null hypothesis is rarely true. Seldom would anything worth testing have an absolutely zero effect. With enough data, you can rule out virtually any null hypothesis, as psychologists John Kruschke and Torrin Liddell of Indiana University point out in a recent paper.

The important question is not whether there’s an effect, but how big the effect is. And null hypothesis testing doesn’t help with that. “The result of a hypothesis test reveals nothing about the magnitude of the effect or the uncertainty of its estimate, which are the key things we should want to know,” Kruschke and Liddell assert.

Cumming advocates using statistics to estimate actual effect magnitude in preference to null hypothesis testing, which “prompts us to see the world as black or white, and to formulate our research aims and make our conclusions in dichotomous terms — an effect … exists or it does not.”

8. Rethink confidence intervals

Estimating an effect’s magnitude isn’t enough — you need to know how precise that estimate is. Such precision is commonly expressed by a confidence interval, similar to an opinion poll’s margin of error. Confidence intervals are already often reported, but they are often not properly examined to infer an effect’s true importance. A result trumpeted as “statistically significant,” for instance, may have a confidence interval covering such a wide range that the actual effect size could be either tiny or titanic. And when a large confidence interval overlaps with zero effect, researchers typically conclude that there is no effect, even though the confidence interval is so large that a sizable effect should not be ruled out.

“Researchers must be disabused of the false belief that if a finding is not significant, it is zero,” Kruschke and Liddell write. “This belief has probably done more than any of the other false beliefs about significance testing to retard the growth of cumulative knowledge in psychology.” And no doubt in other fields as well.

But properly calculated and interpreted, confidence intervals are useful measures of precision, Cumming writes. It’s just important that “we should not lapse back into dichotomous thinking by attaching any particular importance to whether a value of interest lies just inside or just outside our confidence interval.”

7. Improve meta-analyses

As Kruschke and Liddell note, each sample from a population will produce a different estimate of an effect. So no one study ever provides a thoroughly trustworthy result. “Therefore, we should combine the results of all comparable studies to derive a more stable estimate of the true underlying effect,” Kruschke and Liddell conclude. Combining studies in a “meta-analysis” is already common practice in many fields, such as biomedicine, but the conditions required for legitimate (and reliable) meta-analysis are seldom met. For one thing, it’s important to acquire results from every study on the issue, but many are unpublished and unavailable. (Studies that find a supposed effect are more likely to get published than those that don’t, biasing meta-analyses toward validating effects that aren’t really there.)

A meta-analysis of multiple studies can in principle sharpen the precision of an effect’s estimated size. But most meta-analyses typically create very large samples in order that small alleged effects can achieve statistical significance in a null hypothesis test. A better approach, Cumming argues, is to emphasize estimation and forget hypothesis testing.

“Meta-analysis need make no use of null hypothesis statistical testing,” he writes. “Indeed, NHST has caused some of its worst damage by distorting the results of meta-analysis.”

6. Create a Journal of Statistical Shame

OK, maybe it could have a nicer name, but a journal devoted to analyzing the statistical methods and reasoning in papers published elsewhere would help call attention to common problems. In particular, any studies that get widespread media attention could be analyzed for statistical and methodological flaws. Supposedly a journal’s editorial processes and peer review already monitor such methodological issues before publication. Supposedly.

5. Better guidelines for scientists and journal editors

Recently several scientific societies and government agencies have recognized the need to do something about the unsavory statistical situation. In May, the National Science Foundation’s advisory committee on social, behavioral and economic sciences issued a report on replicability in science. Its recommendations mostly suggested studying the issue some more. And the National Institutes of Health has issued “Principles and Guidelines for Reporting Preclinical Research.” But the NIH guidelines are general enough to permit the use of the same flawed methods. Journals should, for instance, “require that statistics be fully reported in the paper, including the statistical test used … and precision measures” such as confidence intervals. But it doesn’t help much to report the statistics fully if the statistical methods are bogus to begin with.

More to the point, a recent document from the Society for Neuroscience specifically recommends against “significance chasing” and advocates closer attention to issues in experimental design and data analysis that can sabotage scientific rigor.

One especially pertinent recommendation, emphasized in a report published June 26 in Science, is “data transparency.” That can mean many things, including sharing of experimental methods and the resulting data so other groups can reproduce experimental findings. Sharing of computer code is also essential in studies relying on high-powered computational analysis of “big data.”

4. Require preregistration of study designs

An issue related to transparency, also emphasized in the guidelines published in Science, is the need for registering experimental plans in advance. Many of science’s problems stem from shoddy statistical practices, such as choosing what result to report after you have tested a bunch of different things and found one that turned out to be statistically significant. Funders and journals should require that all experiments be preregistered, with clear statements about what is being tested and how the statistics will be applied. (In most cases, for example, it’s important to specify in advance how big your sample will be, disallowing the devious technique of continuing to enlarge the sample until you see a result you like.) Such a preregistration program has already been implemented for clinical trials. And websites to facilitate preregistration more generally already exist, such as Open Science Framework.

3. Promote better textbooks

Guidelines helping scientists avoid some of the current system’s pitfalls can help, but they fall short of radically revising the whole research enterprise. A more profound change will require better brainwashing of scientists when they are younger. Therefore, it would be a good idea to attack the problem at its source: textbooks.

Science’s misuse of statistics originates with traditional textbooks, which have taught flawed methods, such as the use of P values, for decades. One new introductory textbook is in the works from Cumming, possibly to appear next year. Many other useful texts exist, including some devoted to Bayesian methods, which Kruschke and Liddell argue are the best way to implement Cumming’s program to kill null hypotheses. But getting the best texts to succeed in a competitive marketplace will require some sort of promotional effort. Perhaps a consortium of scientific societies could convene a committee to create a new text, or at least publicize those that would do more to solve science’s problems than perpetuate them.  

2. Alter the incentive structure

Science has ignored the cancer on its credibility for so long because the incentives built into the system pose a cultural barrier to change. Research grants, promotion and tenure, fame and fortune are typically achieved through publishing a lot of papers. Funding agencies, journal editors and promotion committees all like quantitative metrics to make their decision making easy (that is, requiring little thought). The current system is held hostage by those incentives.

In another article in the June 26 Science, a committee of prominent scientific scholars argues that quality should be emphasized over quantity. “We believe that incentives should be changed so that scholars are rewarded for publishing well rather than often,” the scholars wrote. In other words, science needs to make the commitment to intelligent assessment over mindless quantification.

1. Rethink media coverage of science

One of the incentives driving journals — and therefore scientists — is the desire for media attention. In recent decades journals have fashioned their operations to promote media coverage of the papers they publish — often to the detriment of enforcing rigorous scientific standards for the papers they publish. Science journalists have been happy (well, not all of us) to participate in this conspiracy. As I’ve written elsewhere, the flaws in current statistical practices promote publication of papers trumpeting the “first” instance of a finding, results of any sort in “hot” research fields, or findings likely to garner notice because they are “contrary to previous belief.” These are precisely the qualities that journalists look for in reporting the news. Consequently many research findings reported in the media turn out to be wrong — even though the journalists are faithfully reporting what scientists have published.

As one prominent science journalist once articulated this situation to me, “The problem with science journalism is science.” So maybe it’s time for journalists to strike back, and hold science accountable to the standards it supposedly stands for. I’ll see what I can do.

Follow me on Twitter: @tom_siegfried

Questions or comments on this article? E-mail us at feedback@sciencenews.org

Tom Siegfried

About Tom Siegfried

Tom Siegfried is a contributing correspondent. He was editor in chief of Science News from 2007 to 2012 and managing editor from 2014 to 2017.

Related Stories

  1. MATHScience is heroic, with a tragic (statistical) flawBy Tom SiegfriedJuly 2, 2015
  2. MATHCourts’ use of statistics should be put on trialBy Tom SiegfriedJune 20, 2016
  3. PSYCHOLOGY‘Replication crisis’ spurs reforms in how science studies are doneBy Bruce BowerAugust 27, 2018

More Stories from Science News on Math

  1. MATHHow one physicist is unraveling the mathematics of knittingBy Lakshmi ChandrasekaranJanuary 26, 2021
  2. MATHA documentary and a Bollywood film highlight two disparate paths in mathematicsBy Emily ConoverOctober 15, 2020
  3. COMPUTINGHow next-gen computer generated maps detect partisan gerrymanderingBy Sujata GuptaSeptember 7, 2020
  4. ANIMALSCalculating a dog’s age in human years is harder than you thinkBy Bethany BrookshireJuly 8, 2020
  5. MATHTo cook a perfect steak, use mathBy Emily ConoverApril 13, 2020
  6. SCIENCE & SOCIETYThe U.S. has resisted the metric system for more than 50 yearsBy Cassie MartinApril 3, 2020
  7. MATHHow large a gathering is too large during the coronavirus pandemic?By Dana MackenzieApril 2, 2020
  8. SPACENASA icon Katherine Johnson has died at the age of 101By Emily ConoverFebruary 24, 2020

Items 1 through 4 of 8

From the Nature Index

PAID CONTENThttps://g.adspeed.net/ad.php?do=html&zid=81371&oid=2471&wd=240&ht=240&target=_blankhttps://g.adspeed.net/ad.php?do=html&zid=87275&oid=2471&wd=240&ht=240&target=_blankhttps://g.adspeed.net/ad.php?do=html&zid=87276&oid=2471&wd=240&ht=240&target=_blankhttps://g.adspeed.net/ad.php?do=html&zid=87538&oid=2471&wd=240&ht=240&target=_blankScience News

Science News was founded in 1921 as an independent, nonprofit source of accurate information on the latest news of science, medicine and technology. Today, our mission remains the same: to empower people to evaluate the news and the world around them. It is published by the Society for Science, a nonprofit 501(c)(3) membership organization dedicated to public engagement in scientific research and education.

© Society for Science & the Public 2000–2021. All rights reserved.1719 N Street, N.W., Washington, D.C. 20036 202.785.2255Terms of ServicePrivacy PolicyUse the Shift key with the Tab key to tab back to the search input.Use up and down arrow keys to explore.Use right arrow key to move into the list.Use left arrow key to move back to the parent list.Use tab key to enter the current list item.Use escape to exit the menu.

EDITORIAL  

It’s time to talk about ditching statistical significance

Looking beyond a much used and abused measure would make science harder, but better.

https://www.nature.com/articles/d41586-019-00874-8

Editorial

Moving to a World Beyond “p < 0.05”

Ronald L. Wasserstein,Allen L. Schirm &Nicole A. Lazar

Pages 1-19 | Published online: 20 Mar 2019nfl5182@psu.edu allenschirm@gmail.com ron@amstat.org

American Statistical Association

https://www.amstat.org/

https://www.facebook.com/AmstatNews

https://ww2.amstat.org/meetings/jsm/2021/index.cfm?fbclid=IwAR2F5_7eVrrIsB62koou076DQsNp5xJUP9amBV0dMce6YgK3UhMSunrNVcg

Skip to main content

Advertisement

Nature

Subscribe

  1. nature  
  2. editorials  
  3. article

EDITORIAL  20 MARCH 2019

It’s time to talk about ditching statistical significance

Looking beyond a much used and abused measure would make science harder, but better.

  •  
  •  

Download PDF

Bar chart made of measuring cylinders filled with different amounts of varied coloured liquids
Some statisticians are calling for P values to be abandoned as an arbitrary threshold of significance.Credit: Erik Dreyer/Getty

Fans of The Hitchhiker’s Guide to the Galaxy know that the answer to life, the Universe and everything is 42. The joke, of course, is that truth cannot be revealed by a single number.

And yet this is the job often assigned to values: a measure of how surprising a result is, given assumptions about an experiment, including that no effect exists. Whether a P value falls above or below an arbitrary threshold demarcating ‘statistical significance’ (such as 0.05) decides whether hypotheses are accepted, papers are published and products are brought to market. But using P values as the sole arbiter of what to accept as truth can also mean that some analyses are biased, some false positives are overhyped and some genuine effects are overlooked.Scientists rise up against statistical significance

Change is in the air. In a Comment in this week’s issue, three statisticians call for scientists to abandon statistical significance. The authors do not call for P values themselves to be ditched as a statistical tool — rather, they want an end to their use as an arbitrary threshold of significance. More than 800 researchers have added their names as signatories. A series of related articles is being published by the American Statistical Association this week (R. L. Wasserstein et al. Am. Stat. https://doi.org/10.1080/00031305.2019.1583913; 2019). “The tool has become the tyrant,” laments one article.

Statistical significance is so deeply integrated into scientific practice and evaluation that extricating it would be painful. Critics will counter that arbitrary gatekeepers are better than unclear ones, and that the more useful argument is over which results should count for (or against) evidence of effect. There are reasonable viewpoints on all sides; Nature is not seeking to change how it considers statistical analysis in evaluation of papers at this time, but we encourage readers to share their views (see go.nature.com/correspondence).

If researchers do discard statistical significance, what should they do instead? They can start by educating themselves about statistical misconceptions. Most important will be the courage to consider uncertainty from multiple angles in every study. Logic, background knowledge and experimental design should be considered alongside values and similar metrics to reach a conclusion and decide on its certainty.

When working out which methods to use, researchers should also focus as much as possible on actual problems. People who will duel to the death over abstract theories on the best way to use statistics often agree on results when they are presented with concrete scenarios.

Researchers should seek to analyse data in multiple ways to see whether different analyses converge on the same answer. Projects that have crowdsourced analyses of a data set to diverse teams suggest that this approach can work to validate findings and offer new insights.

In short, be sceptical, pick a good question, and try to answer it in many ways. It takes many numbers to get close to the truth.

Nature 567, 283 (2019)doi: https://doi.org/10.1038/d41586-019-00874-8

Latest on:

Research data

Research management

PublishingWant the games industry to share data? Share yoursCORRESPONDENCE Genomics data: the broken promise is to Indigenous peopleCORRESPONDENCE A guide to the Nature IndexNATURE INDEX 

Jobs from Nature Careers 

Nature Briefing

An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday.Email addressYes! Sign me up to receive the daily Nature Briefing email. I agree my information will be processed in accordance with the Nature and Springer Nature Limited Privacy Policy.Sign up

Related Articles

Subjects

Sign up to Nature Briefing

An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday.Email addressYes! Sign me up to receive the daily Nature Briefing email. I agree my information will be processed in accordance with the Nature and Springer Nature Limited Privacy Policy.Sign up

Nature ISSN 1476-4687 (online)

nature.com sitemap

Nature portfolio

Discover content

Publishing policies

Author & Researcher services

Libraries & institutions

Advertising & partnerships

Career development

Regional websites

Legal & Privacy

Springer Nature

© 2021 Springer Nature Limited

COMMENT  20 MARCH 2019

Scientists rise up against statistical significance

Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects.

Skip to main content

Advertisement

Nature

Subscribe

  1. nature  
  2. comment  
  3. article

COMMENT  20 MARCH 2019

Scientists rise up against statistical significance

Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects.

  •  
  •  
Illustration by David Parkins

PDF version

When was the last time you heard a seminar speaker claim there was ‘no difference’ between two groups because the difference was ‘statistically non-significant’?

If your experience matches ours, there’s a good chance that this happened at the last talk you attended. We hope that at least someone in the audience was perplexed if, as frequently happens, a plot or table showed that there actually was a difference.

How do statistics so often lead scientists to deny differences that those not educated in statistics can plainly see? For several generations, researchers have been warned that a statistically non-significant result does not ‘prove’ the null hypothesis (the hypothesis that there is no difference between groups or no effect of a treatment on some measured outcome)1. Nor do statistically significant results ‘prove’ some other hypothesis. Such misconceptions have famously warped the literature with overstated claims and, less famously, led to claims of conflicts between studies where none exists.

We have some proposals to keep scientists from falling prey to these misconceptions.

Pervasive problem

Let’s be clear about what must stop: we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not. These errors waste research efforts and misinform policy decisions.

For example, consider a series of analyses of unintended effects of anti-inflammatory drugs2. Because their results were statistically non-significant, one set of researchers concluded that exposure to the drugs was “not associated” with new-onset atrial fibrillation (the most common disturbance to heart rhythm) and that the results stood in contrast to those from an earlier study with a statistically significant outcome.

Now, let’s look at the actual data. The researchers describing their statistically non-significant results found a risk ratio of 1.2 (that is, a 20% greater risk in exposed patients relative to unexposed ones). They also found a 95% confidence interval that spanned everything from a trifling risk decrease of 3% to a considerable risk increase of 48% (P = 0.091; our calculation). The researchers from the earlier, statistically significant, study found the exact same risk ratio of 1.2. That study was simply more precise, with an interval spanning from 9% to 33% greater risk (P = 0.0003; our calculation).

It is ludicrous to conclude that the statistically non-significant results showed “no association”, when the interval estimate included serious risk increases; it is equally absurd to claim these results were in contrast with the earlier results showing an identical observed effect. Yet these common practices show how reliance on thresholds of statistical significance can mislead us (see ‘Beware false conclusions’).

These and similar errors are widespread. Surveys of hundreds of articles have found that statistically non-significant results are interpreted as indicating ‘no difference’ or ‘no effect’ in around half (see ‘Wrong interpretations’ and Supplementary Information).

In 2016, the American Statistical Association released a statement in The American Statistician warning against the misuse of statistical significance and P values. The issue also included many commentaries on the subject. This month, a special issue in the same journal attempts to push these reforms further. It presents more than 40 papers on ‘Statistical inference in the 21st century: a world beyond P < 0.05’. The editors introduce the collection with the caution “don’t say ‘statistically significant’”3. Another article4 with dozens of signatories also calls on authors and journal editors to disavow those terms.

We agree, and call for the entire concept of statistical significance to be abandoned.

We are far from alone. When we invited others to read a draft of this comment and sign their names if they concurred with our message, 250 did so within the first 24 hours. A week later, we had more than 800 signatories — all checked for an academic affiliation or other indication of present or past work in a field that depends on statistical modelling (see the list and final count of signatories in the Supplementary Information). These include statisticians, clinical and medical researchers, biologists and psychologists from more than 50 countries and across all continents except Antarctica. One advocate called it a “surgical strike against thoughtless testing of statistical significance” and “an opportunity to register your voice in favour of better scientific practices”.

We are not calling for a ban on P values. Nor are we saying they cannot be used as a decision criterion in certain specialized applications (such as determining whether a manufacturing process meets some quality-control standard). And we are also not advocating for an anything-goes situation, in which weak evidence suddenly becomes credible. Rather, and in line with many others over the decades, we are calling for a stop to the use of P values in the conventional, dichotomous way — to decide whether a result refutes or supports a scientific hypothesis5.

Quit categorizing

The trouble is human and cognitive more than it is statistical: bucketing results into ‘statistically significant’ and ‘statistically non-significant’ makes people think that the items assigned in that way are categorically different68. The same problems are likely to arise under any proposed statistical alternative that involves dichotomization, whether frequentist, Bayesian or otherwise.

Unfortunately, the false belief that crossing the threshold of statistical significance is enough to show that a result is ‘real’ has led scientists and journal editors to privilege such results, thereby distorting the literature. Statistically significant estimates are biased upwards in magnitude and potentially to a large degree, whereas statistically non-significant estimates are biased downwards in magnitude. Consequently, any discussion that focuses on estimates chosen for their significance will be biased. On top of this, the rigid focus on statistical significance encourages researchers to choose data and methods that yield statistical significance for some desired (or simply publishable) result, or that yield statistical non-significance for an undesired result, such as potential side effects of drugs — thereby invalidating conclusions.

The pre-registration of studies and a commitment to publish all results of all analyses can do much to mitigate these issues. However, even results from pre-registered studies can be biased by decisions invariably left open in the analysis plan9. This occurs even with the best of intentions.

Again, we are not advocating a ban on P values, confidence intervals or other statistical measures — only that we should not treat them categorically. This includes dichotomization as statistically significant or not, as well as categorization based on other statistical measures such as Bayes factors.

One reason to avoid such ‘dichotomania’ is that all statistics, including P values and confidence intervals, naturally vary from study to study, and often do so to a surprising degree. In fact, random variation alone can easily lead to large disparities in P values, far beyond falling just to either side of the 0.05 threshold. For example, even if researchers could conduct two perfect replication studies of some genuine effect, each with 80% power (chance) of achieving P < 0.05, it would not be very surprising for one to obtain P < 0.01 and the other P > 0.30. Whether a P value is small or large, caution is warranted.

We must learn to embrace uncertainty. One practical way to do so is to rename confidence intervals as ‘compatibility intervals’ and interpret them in a way that avoids overconfidence. Specifically, we recommend that authors describe the practical implications of all values inside the interval, especially the observed effect (or point estimate) and the limits. In doing so, they should remember that all the values between the interval’s limits are reasonably compatible with the data, given the statistical assumptions used to compute the interval7,10. Therefore, singling out one particular value (such as the null value) in the interval as ‘shown’ makes no sense.

We’re frankly sick of seeing such nonsensical ‘proofs of the null’ and claims of non-association in presentations, research articles, reviews and instructional materials. An interval that contains the null value will often also contain non-null values of high practical importance. That said, if you deem all of the values inside the interval to be practically unimportant, you might then be able to say something like ‘our results are most compatible with no important effect’.

When talking about compatibility intervals, bear in mind four things. First, just because the interval gives the values most compatible with the data, given the assumptions, it doesn’t mean values outside it are incompatible; they are just less compatible. In fact, values just outside the interval do not differ substantively from those just inside the interval. It is thus wrong to claim that an interval shows all possible values.

Second, not all values inside are equally compatible with the data, given the assumptions. The point estimate is the most compatible, and values near it are more compatible than those near the limits. This is why we urge authors to discuss the point estimate, even when they have a large P value or a wide interval, as well as discussing the limits of that interval. For example, the authors above could have written: ‘Like a previous study, our results suggest a 20% increase in risk of new-onset atrial fibrillation in patients given the anti-inflammatory drugs. Nonetheless, a risk difference ranging from a 3% decrease, a small negative association, to a 48% increase, a substantial positive association, is also reasonably compatible with our data, given our assumptions.’ Interpreting the point estimate, while acknowledging its uncertainty, will keep you from making false declarations of ‘no difference’, and from making overconfident claims.

Third, like the 0.05 threshold from which it came, the default 95% used to compute intervals is itself an arbitrary convention. It is based on the false idea that there is a 95% chance that the computed interval itself contains the true value, coupled with the vague feeling that this is a basis for a confident decision. A different level can be justified, depending on the application. And, as in the anti-inflammatory-drugs example, interval estimates can perpetuate the problems of statistical significance when the dichotomization they impose is treated as a scientific standard.

Last, and most important of all, be humble: compatibility assessments hinge on the correctness of the statistical assumptions used to compute the interval. In practice, these assumptions are at best subject to considerable uncertainty7,8,10. Make these assumptions as clear as possible and test the ones you can, for example by plotting your data and by fitting alternative models, and then reporting all results.

Whatever the statistics show, it is fine to suggest reasons for your results, but discuss a range of potential explanations, not just favoured ones. Inferences should be scientific, and that goes far beyond the merely statistical. Factors such as background evidence, study design, data quality and understanding of underlying mechanisms are often more important than statistical measures such as P values or intervals.

The objection we hear most against retiring statistical significance is that it is needed to make yes-or-no decisions. But for the choices often required in regulatory, policy and business environments, decisions based on the costs, benefits and likelihoods of all potential consequences always beat those made based solely on statistical significance. Moreover, for decisions about whether to pursue a research idea further, there is no simple connection between a P value and the probable results of subsequent studies.

What will retiring statistical significance look like? We hope that methods sections and data tabulation will be more detailed and nuanced. Authors will emphasize their estimates and the uncertainty in them — for example, by explicitly discussing the lower and upper limits of their intervals. They will not rely on significance tests. When P values are reported, they will be given with sensible precision (for example, P = 0.021 or P = 0.13) — without adornments such as stars or letters to denote statistical significance and not as binary inequalities (P  < 0.05 or P > 0.05). Decisions to interpret or to publish results will not be based on statistical thresholds. People will spend less time with statistical software, and more time thinking.

Our call to retire statistical significance and to use confidence intervals as compatibility intervals is not a panacea. Although it will eliminate many bad practices, it could well introduce new ones. Thus, monitoring the literature for statistical abuses should be an ongoing priority for the scientific community. But eradicating categorization will help to halt overconfident claims, unwarranted declarations of ‘no difference’ and absurd statements about ‘replication failure’ when the results from the original and replication studies are highly compatible. The misuse of statistical significance has done much harm to the scientific community and those who rely on scientific advice. P values, intervals and other statistical measures all have their place, but it’s time for statistical significance to go.

Nature 567, 305-307 (2019)doi: https://doi.org/10.1038/d41586-019-00857-9

References

  1. 1.Fisher, R. A. Nature 136, 474 (1935).
  2. 2.Schmidt, M. & Rothman, K. J. Int. J. Cardiol. 177, 1089–1090 (2014).
  3. 3.Wasserstein, R. L., Schirm, A. & Lazar, N. A. Am. Stat. https://doi.org/10.1080/00031305.2019.1583913 (2019).
  4. 4.Hurlbert, S. H., Levine, R. A. & Utts, J. Am. Stat. https://doi.org/10.1080/00031305.2018.1543616 (2019).
  5. 5.Lehmann, E. L. Testing Statistical Hypotheses 2nd edn 70–71 (Springer, 1986).
    • 6.Gigerenzer, G. Adv. Meth. Pract. Psychol. Sci. 1, 198–218 (2018).
    • 7.Greenland, S. Am. J. Epidemiol. 186, 639–645 (2017).
    • 8.McShane, B. B., Gal, D., Gelman, A., Robert, C. & Tackett, J. L. Am. Stat. https://doi.org/10.1080/00031305.2018.1527253 (2019).
    • 9.Gelman, A. & Loken, E. Am. Sci. 102, 460–465 (2014).
    • 10.Amrhein, V., Trafimow, D. & Greenland, S. Am. Stat. https://doi.org/10.1080/00031305.2018.1543137 (2019).

    Download references

    SUPPLEMENTARY INFORMATION

    1. Supp info for Amrhein et al Comment_data V2

    Latest on:

    Research data

    Research managementWant the games industry to share data? Share yoursCORRESPONDENCE Genomics data: the broken promise is to Indigenous peopleCORRESPONDENCE A guide to the Nature IndexNATURE INDEX 

    Jobs from Nature Careers 

    Nature Briefing

    An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday.Email addressYes! Sign me up to receive the daily Nature Briefing email. I agree my information will be processed in accordance with the Nature and Springer Nature Limited Privacy Policy.Sign up

    Related Articles

    Subjects

    Sign up to Nature Briefing

    An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday.Email addressYes! Sign me up to receive the daily Nature Briefing email. I agree my information will be processed in accordance with the Nature and Springer Nature Limited Privacy Policy.Sign up

    Nature ISSN 1476-4687 (online)

    nature.com sitemap

    Nature portfolio

    Discover content

    Publishing policies

    Author & Researcher services

    Libraries & institutions

    Advertising & partnerships

    Career development

    Regional websites

    Legal & Privacy

    Illustration by David ParkinsPDF version

    When was the last time you heard a seminar speaker claim there was ‘no difference’ between two groups because the difference was ‘statistically non-significant’?

    If your experience matches ours, there’s a good chance that this happened at the last talk you attended. We hope that at least someone in the audience was perplexed if, as frequently happens, a plot or table showed that there actually was a difference.

    How do statistics so often lead scientists to deny differences that those not educated in statistics can plainly see? For several generations, researchers have been warned that a statistically non-significant result does not ‘prove’ the null hypothesis (the hypothesis that there is no difference between groups or no effect of a treatment on some measured outcome)1. Nor do statistically significant results ‘prove’ some other hypothesis. Such misconceptions have famously warped the literature with overstated claims and, less famously, led to claims of conflicts between studies where none exists.

    We have some proposals to keep scientists from falling prey to these misconceptions.

    Pervasive problem

    Let’s be clear about what must stop: we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not. These errors waste research efforts and misinform policy decisions.

    For example, consider a series of analyses of unintended effects of anti-inflammatory drugs2. Because their results were statistically non-significant, one set of researchers concluded that exposure to the drugs was “not associated” with new-onset atrial fibrillation (the most common disturbance to heart rhythm) and that the results stood in contrast to those from an earlier study with a statistically significant outcome.

    Now, let’s look at the actual data. The researchers describing their statistically non-significant results found a risk ratio of 1.2 (that is, a 20% greater risk in exposed patients relative to unexposed ones). They also found a 95% confidence interval that spanned everything from a trifling risk decrease of 3% to a considerable risk increase of 48% (P = 0.091; our calculation). The researchers from the earlier, statistically significant, study found the exact same risk ratio of 1.2. That study was simply more precise, with an interval spanning from 9% to 33% greater risk (P = 0.0003; our calculation).

    It is ludicrous to conclude that the statistically non-significant results showed “no association”, when the interval estimate included serious risk increases; it is equally absurd to claim these results were in contrast with the earlier results showing an identical observed effect. Yet these common practices show how reliance on thresholds of statistical significance can mislead us (see ‘Beware false conclusions’).

    Illustration by David ParkinsPDF version

    When was the last time you heard a seminar speaker claim there was ‘no difference’ between two groups because the difference was ‘statistically non-significant’?

    If your experience matches ours, there’s a good chance that this happened at the last talk you attended. We hope that at least someone in the audience was perplexed if, as frequently happens, a plot or table showed that there actually was a difference.

    How do statistics so often lead scientists to deny differences that those not educated in statistics can plainly see? For several generations, researchers have been warned that a statistically non-significant result does not ‘prove’ the null hypothesis (the hypothesis that there is no difference between groups or no effect of a treatment on some measured outcome)1. Nor do statistically significant results ‘prove’ some other hypothesis. Such misconceptions have famously warped the literature with overstated claims and, less famously, led to claims of conflicts between studies where none exists.

    We have some proposals to keep scientists from falling prey to these misconceptions.

    Pervasive problem

    Let’s be clear about what must stop: we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not. These errors waste research efforts and misinform policy decisions.

    For example, consider a series of analyses of unintended effects of anti-inflammatory drugs2. Because their results were statistically non-significant, one set of researchers concluded that exposure to the drugs was “not associated” with new-onset atrial fibrillation (the most common disturbance to heart rhythm) and that the results stood in contrast to those from an earlier study with a statistically significant outcome.

    Now, let’s look at the actual data. The researchers describing their statistically non-significant results found a risk ratio of 1.2 (that is, a 20% greater risk in exposed patients relative to unexposed ones). They also found a 95% confidence interval that spanned everything from a trifling risk decrease of 3% to a considerable risk increase of 48% (P = 0.091; our calculation). The researchers from the earlier, statistically significant, study found the exact same risk ratio of 1.2. That study was simply more precise, with an interval spanning from 9% to 33% greater risk (P = 0.0003; our calculation).

    It is ludicrous to conclude that the statistically non-significant results showed “no association”, when the interval estimate included serious risk increases; it is equally absurd to claim these results were in contrast with the earlier results showing an identical observed effect. Yet these common practices show how reliance on thresholds of statistical significance can mislead us (see ‘Beware false conclusions’).

    Source: V. Amrhein et al.

    These and similar errors are widespread. Surveys of hundreds of articles have found that statistically non-significant results are interpreted as indicating ‘no difference’ or ‘no effect’ in around half (see ‘Wrong interpretations’ and Supplementary Information).

    In 2016, the American Statistical Association released a statement in The American Statistician warning against the misuse of statistical significance and P values. The issue also included many commentaries on the subject. This month, a special issue in the same journal attempts to push these reforms further. It presents more than 40 papers on ‘Statistical inference in the 21st century: a world beyond P < 0.05’. The editors introduce the collection with the caution “don’t say ‘statistically significant’”3. Another article4 with dozens of signatories also calls on authors and journal editors to disavow those terms.

    We agree, and call for the entire concept of statistical significance to be abandoned.

    Source: V. Amrhein et al.

    We are far from alone. When we invited others to read a draft of this comment and sign their names if they concurred with our message, 250 did so within the first 24 hours. A week later, we had more than 800 signatories — all checked for an academic affiliation or other indication of present or past work in a field that depends on statistical modelling (see the list and final count of signatories in the Supplementary Information). These include statisticians, clinical and medical researchers, biologists and psychologists from more than 50 countries and across all continents except Antarctica. One advocate called it a “surgical strike against thoughtless testing of statistical significance” and “an opportunity to register your voice in favour of better scientific practices”.

    References

    1. 1.Fisher, R. A. Nature 136, 474 (1935).
    2. 2.Schmidt, M. & Rothman, K. J. Int. J. Cardiol. 177, 1089–1090 (2014).
    3. 3.Wasserstein, R. L., Schirm, A. & Lazar, N. A. Am. Stat. https://doi.org/10.1080/00031305.2019.1583913 (2019).
    4. 4.Hurlbert, S. H., Levine, R. A. & Utts, J. Am. Stat. https://doi.org/10.1080/00031305.2018.1543616 (2019).
    5. 5.Lehmann, E. L. Testing Statistical Hypotheses 2nd edn 70–71 (Springer, 1986).
      • 6.Gigerenzer, G. Adv. Meth. Pract. Psychol. Sci. 1, 198–218 (2018).
      • 7.Greenland, S. Am. J. Epidemiol. 186, 639–645 (2017).
      • 8.McShane, B. B., Gal, D., Gelman, A., Robert, C. & Tackett, J. L. Am. Stat. https://doi.org/10.1080/00031305.2018.1527253 (2019).
      • 9.Gelman, A. & Loken, E. Am. Sci. 102, 460–465 (2014).
      • 10.Amrhein, V., Trafimow, D. & Greenland, S. Am. Stat. https://doi.org/10.1080/00031305.2018.1543137 (2019).

      Download references

      SUPPLEMENTARY INFORMATION

      1. Supp info for Amrhein et al Comment_data V2

      Springer Nature

      © 2021 Springer Nature Limited

      SUPPLEMENTARY INFORMATION
      SUPPLEMENTARY INFORMATION | NATURE | 1
      COMMENT
      Supplementary information to:
      Retire statistical significance
      Valentin Amrhein et al.
      Supplementary text to a Comment published in Nature 567, 305–307 (2019)
      https://doi.org/10.1038/d41586-019-00857-9Schatz P, Jay KA, McComb J, McLaughlin JR (2005).
      Misuse of statistical tests in Archives of Clinical Neuropsychology publications. Archives of Clinical Neuropsychology 20:1053-1059
      Fidler F, Burgman MA, Cumming G, Buttrose R,
      Thomason N. (2006). Impact of criticism of nullhypothesis significance testing on statistical reporting practices in conservation biology. Conservation
      Biology 20:1539-1544
      Hoekstra R, Finch S, Kiers HAL, Johnson A.
      (2006). Probability as certainty: dichotomous thinking and the misuse of p values. Psychonomic Bulletin
      & Review 13:1033-1037
      Bernardi F, Chakhaia L, Leopold L. (2017). ‘Sing me
      a song with social significance’: the (mis)use of statistical significance testing in European sociological
      research. European Sociological Review 33:1-15
      “using statistical tests to confirm the null, that
      there is no difference between groups.”
      (Page 1054)
      “statistically nonsignificant results were interpreted
      as evidence of ‘no effect’ or ‘no relationship’ “
      (Page 1542)
      “Phrases such as ‘there is no effect,’ ‘there was
      no evidence for’ (combined with an effect in the
      expected direction), ‘the nonexistence of the effect,’
      ‘no effect was found,’ ‘are equally affected,’ ‘there
      was no main effect,’ ‘A and B did not differ,’ or ‘the
      significance test reveals that there is no difference’ “
      (Pages 1034-1035)
      “authors mechanically equate a statistically
      insignificant effect with a zero effect.”
      (Page 2)

      • All four surveys examined the journal in which they were published. Fidler et al. 2006 also examined Biological Conservation.
        ** All four surveys compared two or more distinct time periods; we provide data for only the most recent time period.
        * Schatz et al. 2005 provide only the total number of articles considered. Fidler et al. 2006 and Hoekstra et al. 2006 provide the total number of articles considered and the number of articles that contained a statistically non-significant result and were thus eligible to make the error. Bernardi et al. 2017 provide only the total number of “articles qualifying for the review.” For consistency across all four surveys, we use the total number of articles considered as the denominator when computing the percentage making an error; this is a conservatively low estimate of the proportion of articles that misinterpret statistically non-significant results because it includes articles ineligible to make the error (i.e., because they do not contain a statistically non-significant result) in the denominator. 2001-2004 2005 2002-2004 2010-2014 81 (Page 1057) 42 (Table 1) 145 (Table 1; 60% of 242) 134 (Table 2; 100%-49% = 51% of 262) 170 (Page 1057) 100 (Table 1) 259 (Table 1) 262 (Table 1) 51% (402/791) articles from five journals erroneously interpret statistically non-significant results as indicating “no effect” Most recent time period
        Number of
        articles with
        errors
        Number of
        articles con- sidered*** Survey* Error criterionSupplementary information to:
        Retire statistical significance (Comment in Nature 567, 305–307; 2019)
        Full list of co-signatories
        854 scientists from 52 countries are signatories to “Retire statistical significance”
        Compiled by Lilla Lovász, Zoological Institute, University of Basel, Switzerland
        Peter Aaby Bandim Health Project, Bissau, Guinea-Bissau
        Kevin Aagaard Colorado Parks and Wildlife, Fort Collins, CO, USA
        Preben Aavitsland Division of Infection Control and Environmental Health, Norwegian Institute of Public Health, Oslo, Norway
        Paul Acker School of Biological Sciences, University of Aberdeen, Aberdeen, UK
        Mohd Bakri Adam Institute for Mathematical Research, University Putra Malaysia, Selangor, Malaysia
        Amin Adibi Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada
        Matthew Agler Department of Microbiology, Friedrich-Schiller Univsrity Jena, Jena, Germany
        Daniel Aguirre-Acevedo Medical Research Institute, Medicine School, Unversidad de Antioquia, Medellín, Colombia
        Thomas P. Ahern Department of Surgery, Larner College of Medicine, University of Vermont, Burlington, VT, USA
        Saiam Ahmed MRC Clinical Trials Unit, Institute of Clinical Trials and Methodology, University College London, London, UK
        Jeffrey Akiki School of Professional Studies, Columbia University, New York, NY, USA
        Teddy J. Akiki Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
        Yasser Albogami Clinical Pharmacy Department, King Saud University, Riyadh, Saudi Arabia
        Robert W Aldridge Institute of Health Informatics, University College London, London, UK
        Ayesha S. Ali Cancer Research UK Clinical Trials Unit, University of Birmingham, Edgbaston, Birmingham, UK
        Anna Alińska Department of Psychology, University of Warsaw, Warsaw, Poland
        Nico Alioravainen Department of Environmental and Biological Sciences, Faculty of Science and Forestry, University of Eastern Finland, Kuopio, Finland
        Christian L. Althaus Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
        Jonathan Amburgey Department of Psychology, Westminster College, Salt Lake City, UT, USA
        Avnika B. Amin Department of Epidemiology, Emory University, Atlanta, GA, USA
        Shobeir Amirnequiee Ivey Business School, University of Western Ontario, London, ON, Canada
        Rune Martens Andersen Centre for Cancer and Organ Diseases, Rigshospitalet, Copenhagen, Denmark
        Elizabeth B. Andrews Research Triangle Institute, Research Triangle Park, NC, USA
        Peter D. Angevine Department of Neurological Surgery, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
        Nils Anthes Institute of Evolution and Ecology, University of Tuebingen, Tuebingen, Germany
        Onyebuchi A. Arah Department of Epidemiology, University of California – Los Angeles, Los Angeles, CA, USA
        Gianluigi Ardissino Department of Pediatrics, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milano, Italy
        Cristina Ardura-Garcia Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
        Corson N. Areshenkoff Centre for Neuroscience Studies, Queens University, Kingston, ON, Canada
        Cono Ariti Cardiff University Medical School, Cardiff, UK
        Kellyn F. Arnold Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
        Jaan Aru Institute of Biology, Humboldt University of Berlin, Berlin, Germany
        Ann Aschengrau Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
        Peter H. Asdahl Department of Hematology, Aarhus University Hospital, Aarhus, Denmark
        Deborah Ashby School of Public Health, Imperial College London, London, UK
        Fortune Atri Kaiser Permanente Medical Group, Dept of Psychiatry, Los Angeles, CA, USA
        Reto Auer Institute of Primary Health Care (BIHAM), Universtiy of Bern, Bern, Switzerland
        Matthieu Authier Observatoire PELAGIS – UMS 3462, CNRS-LRU, La Rochelle Université, La Rochelle, France
        Ignacio Avellino Sorbonne Université, Institut des Systèmes Intelligents et de Robotique (ISIR), CNRS, Paris, France
        Marc T. Avey Public Health Agency of Canada, Ottawa, ON, Canada
        Eli Awtrey Department of Management, Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH, USA
        Flavio Azevedo Cologne University, Cologne, Germany; Social Justice Lab, New York University, New York, NY, USA
        Rasmus Bååth Lund University, Lund, Sweden
        Marko Bachl University of Hohenheim, Stuttgart, GermanyLance Bachmeier Department of Economics, Kansas State University, Manhattan, KS, USA
        Kathleen E. Bachynski Department of Medicine, NYU Langone Health, New York, NY, USA
        J. Michael Bailey Department of Psychology, Northwestern University, Evanston, IL, USA
        Emmanuel S. Baja Institute of Clinical Epidemiology, NIH University of the Philippines-Manila, Manila, Philippines
        Andrew M. Baker Marketing Department, Fowler College of Business, San Diego State University, San Diego, CA, USA
        Daniel Hart Baker Department of Psychology, University of York, York, UK
        Nekane Balluerka University of the Basque Country, Donostia, Spain
        Layla J. Barkal Department of Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
        Carlos Javier Barrera-Causil Faculty of Applied and Exact Sciences, Metropolitan Technological Institute, Medellín, Colombia
        Malcolm Barrett University of Southern California, Los Angeles, CA, USA
        Dwight Barry Seattle Children’s Hospital, Seattle, WA, USA
        Fabian Bartsch IÉSEG School of Management, Paris, France
        Heidi Baseler Centre for Neuroscience, Hull York Medical School (HYMS), Department of Psychology, University of York, York, UK
        Leonardo Soares Bastos Programa de Computação Científica, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
        Alan Batterham School of Health and Social Care, Teesside University, Middlesbrough, UK
        Sebastian Baumeister Ludwig Maximilians Universität München, München, Germany
        Benjamin Beall Hatfield Consultants, North Vancouver, BC, Canada
        Adam Beavan Saarland University, Saarbrücken, Germany
        Bill Beavis Iowa State University, Ames, IA, USA
        Adan Z. Becerra Social & Scientific Systems, Silver Spring, MD, USA
        Nathaniel Beck Department of Politics, New York University, New York, NY, USA
        Betsy Jane Becker Measurement & Statistics, College of Education, Florida State University, Tallahassee, FL, USA
        Mumtaz Begum School of Public Health, University of Adelaide, Adelaide, Australia
        Eric Beh School of Mathematical & Physical Sciences, University of Newcastle, Callaghan, Australia
        Tara Behrend The George Washington University, Washington, DC, USA
        Daniel J. Benjamin Center for Economic and Social Research and Economics Department, University of Southern California, Los Angeles, CA, USA
        Christine Stabell Benn University of Southern Denmark, Odense, Denmark
        Rebecca Bentley Melbourne School of Population and Global Health, University of Melbourne, Parkville, Victoria, Australia
        Paola Berchialla Department of Clinical and Biological Sciences, University of Torino, Torino, Italy
        Ron Berman Wharton School, University on Pennsylvania, Philadelphia, PA, USA
        Fabrice Berna University of Strasbourg, Psychiatry Department, Inserm U1114, Strasbourg, France
        Daniel Berner Zoological Institute, University of Basel, Basel, Switzerland
        José Berrios-Riquelme Departamento de Ciencias Sociales, Universidad de Tarapacá, Arica, Chile
        Lonni Besançon Media and Information Technology, Linköping University, Campus Norrköping, Norrköping, Sweden
        Yusuf K. Bilgiç Department of Mathematics, State University of New York at Geneseo, Geneseo, NY, USA
        Dean Billheimer Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, USA
        Zachary Binney Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
        Tony Blakely Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Australia
        Neville M. Blampied School of Psychology Speech & Hearing, University of Canterbury, Christchurch, New Zealand
        Julian Blanc United Nations Environment Programme, Nairobi, Kenya
        Matthias Bluemke GESIS – Leibniz Institute for the Social Sciences, Mannheim, Germany
        Jeffrey D. Blume Department of Biostatistics, Vanderbilt University, Nashville, TN, USA
        Ulf Böckenholt Kellogg School of Manangement, Northwestern, Evanston, IL, USA
        Lisa Bodnar Department of Epidemiology, University of Pittsburgh, Pittsburgh, PA, USA
        Daniel J. Bogiatzis Gibbons The Behavioural Insights Team, Westminster, London, UK
        Matthieu P. Boisgontier Department of Movement Sciences, KU Leuven, Leuven, Belgium
        Niall Bolger Department of Psychology, Columbia University, New York, NY, USA
        Andrea Bonisoli-Alquati Department of Biological Sciences, California State Polytechnic University – Pomona, Pomona, CA, USA
        Roser Bono Quantitative Psychology Unit, Faculty of Psychology, University of Barcelona, Barcelona, Spain
        Michael Borenstein Biostat Inc., Englewood, NJ, USA
        David N. Borg The Hopkins Centre: Research for Rehabilitation and Resilience, Menzies Health Institute Queensland, Griffith University, Brisbane, Australia
        Nicolai T. Borgen Department of Sociology and Human Geography, University of Oslo, Oslo, Norway
        Blanca Borras-Bermejo Department of Preventive Medicine and Epidemiology, Vall d’Hebron University Hospital, Barcelona, Spain
        Michael Krabbe Borregaard Center for Macroecology, Evolution and Climate, University of Copenhagen, Copenhagen, Denmark
        Daniel A. Bowman School of Natural Sciences and Mathematics, Ferrum College, Ferrum, VA, USA
        Michelle F. Bowman Forensecology, Guelph, ON, Canada Randall Boyes Department of Public Health Sciences, Queen’s University, Kingston, ON, Canada
        Michael T. Bradley Department of Psychology University of New Brunswick, Saint John, NB, Canada
        Eric T. Bradlow The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
        Patrick T. Bradshaw Division of Epidemiology and Biostatistics, School of Public Health, University of California – Berkeley, Berkeley, CA, USA
        Timothy Brathwaite Department of Civil and Environmental Engineering, University of California – Berkeley, Berkeley, CA, USA
        Joseph Braun Department of Epidemiology, Brown University, Providence, RI, USA
        Michael Braun Cox School of Business, Southern Methodist University, Dallas, TX, USA
        Francis Q. Brearley School of Science and the Environment, Manchester Metropolitan University, Manchester, UK
        Jessica Y. Breland VA Palo Alto Health Care System, Menlo Park, CA USA (views her own)
        Alexander Breskin Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
        Mathias Azevedo Bastian Bressel Centre for Biostatistics and Clinical Trials (BaCT), Peter MacCallum Cancer Centre, Melbourne, Australia
        Martins Briedis Swiss Ornithological Institute, Sempach, Switzerland
        William M. Briggs Independent researcher, New York, NY, USA
        Kristian Brock Cancer Research UK Clinical Trials Unit, University of Birmingham, Edgbaston, Birmingham, UK
        Daniel R. Brooks Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
        James Brophy McGill Universithy, Montreal, Canada
        Hernan A. Bruno Faculty of Management, Economics and Social Sciences, University of Cologne, Cologne, Germany
        Jatan Buch Department of Physics, Brown University, Providence, RI, USA
        William R. Buchanan Fayette County Public Schools, Lexington, KY, USA
        Catherine M. Bulka Department of Environmental Sciences and Engineering, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
        Martin Bulla Department of Coastal Systems, NIOZ Royal Netherlands Institute for Sea Research, ‘t Horntje (Texel), The Netherlands
        Kenneth P. Burnham Department of Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, CO, USA
        Fausto Andres Bustos Carrillo Division of Epidemiology and Biostatistics, School of Public Health, University of California – Berkeley, Berkeley, CA, USA
        Andrew W. Byrne School of Biological Sciences, Queens University Belfast, Belfast, Northern Ireland, UK
        Danilo Bzdok Department of Psychiatry, Psychotherapy and Psychosomatics, RWTH Aachen University, Aachen, Germany
        Robert Calin-Jageman Department of Psychology, Dominican University, River Forest, IL, USA
        Jose Andres Calvache Department of Anesthesiology, Universidad del Cauca, Popayan, Colombia
        Emmanuelle Cam Laboratoire LEMAR, Université de Bretagne Occidentale; CNRS; IRD; IFREMER; Institut Universitaire Européen de la Mer, Plouzané, France
        Hank Campbell Science 2.0, Folsom, CA, USA
        Guillermo Campitelli College of Science, Health, Engineering & Education, Murdoch University, Perth, Australia
        Jean-François Campourcy Data scientist, Albi, France
        Francesco S. Cardona Department of Pediatrics, Medical University Vienna, Vienna, Austria
        Martine G. Caris Department of Internal Medicine, Amsterdam UMC, Amsterdam, The Netherlands
        John Carlin Murdoch Children’s Research Institute & The University of Melbourne, Parkville, Victoria, Australia
        Marc Carlson Seattle Children’s Research Institute, Seattle, WA, USA
        Daniel J. Carter Faculty of Epidemiology & Population Health, London School of Hygiene and Tropical Medicine, London, UK
        Humberto M. Carvalho Department of Physical Education, School of Sports, Federal University of Santa Catarina, Florianópolis, Brazil
        Joan A. Casey Berkeley School of Public Health, University of California – Berkeley, Berkeley, CA, USA
        Maria Eugenia Castellanos Department of Epidemiology and Biostatistics, University of Georgia, Athens, GA, USA
        Christopher M. Castille Department of Management and Marketing, Nicholls State University, Thibodaux, LA, USA
        Hector Alejandro Cepeda-Freyre Faculty of Psychology, Benemérita Universidad Autonoma de Puebla, Puebla, México
        Ramakrishna Chakravarthi School of Psychology, University of Aberdeen, Aberdeen, Scotland, UK
        Armand Chatard Département de Psychologie, Université de Poitiers & CNRS, Poitiers, France
        Ricardo Chavarriaga Center for Neuroprosthetics, EPFL, Lausanne, Switzerland
        Gang Chen SSCC/NIMH, National Institutes of Health, Bethesda, MD, USA
        Boris Cheval Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland
        Fanny Chevalier Departments of Computer Science and Statistical Sciences, University of Toronto, Toronto, Canada
        Alessandro Chiarotto Department of Health Sciences, Amsterdam Movement Sciences research institute, VU University, Amsterdam, The Netherlands
        Virginia Chiocchia Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
        Arnaud Chiolero University of Bern, Switzerland; Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, QC, Canada
        Catherine R. Chittleborough School of Public Health, University of Adelaide, Adelaide, Australia
        Anna Chołoniewska Institute of Genetics and Animal Breeding PAS, Jastrzębiec, Poland
        Yan Ru Choo National University of Singapore, Singapore
        Zad Chow Department of Population Health, New York University Langone Medical Center, New York, NY, USA
        John Christie Department of Psychology and Neuroscience, Dalhousie University, Halifax, NS, Canada
        Katherine M. Chudoba MIS Department, Jon M Huntsman School of Business, Utah State University, Logan, UT, USADaniel Ciocca Oncology Laboratory, IMBECU, CCT, CONICET, Mendoza, Argentina
        Valter Ciocca School of Audiology and Speech Sciences, University of British Columbia , Vancouver , BC, Canada
        Bart Claus IÉSEG School of Management, Paris, France
        Jessica Cobian American University, Washington, DC, USA
        Lincoln J. Colling Department of Psychology, University of Cambridge, Cambridge, UK
        David Colquhoun University College London, London, UK
        Aldo Compagnoni Martin Luther University Halle-Wittenberg, German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
        John Connolly Division of Pharmacoepidemiology, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
        Jennie Connor Department of Preventive and Social Medicine, University of Otago, Dunedin, New Zealand
        Dario Consonni Epidemiology Unit, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy
        Stefano Conti Improvement Analytics Unit, NHS England and The Health Foundation, London, UK
        Thomas D. Cook Trachtenberg School of Public Policy, George Washington University, Washington, DC, USA
        Andrew B. Cooper Seattle Children’s Hospital, Seattle, WA, USA
        Matthew R. Cooperberg Departments of Urology and Epidemiology & Biostatistics, University of California, San Francisco, CA, USA
        Juan C. Correa School of Statistics, Faculty of Sciences, National University of Colombia, Medellín, Colombia
        Dominique Costagliola Institut Pierre Louis d’Epidémiologie de de Santé Publique, Sorbonne Université, INSERM, Paris, France
        Denis Cousineau École de Psychologie, Université d’Ottawa, Ottawa, ON, Canada
        Antoine Coutrot CNRS, Laboratoire des Sciences du Numérique de Nantes (LS2N), University of Nantes, Nantes, France
        Christian Crandall Department of Psychology, University of Kansas, Lawrence, KS, USA
        Henk Cremers Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
        Suzie Cro Imperial College London, London, UK
        Deirdre Cronin-Fenton Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark
        Geoff Cumming School of Psychology and Public Health, La Trobe University, Melbourne, Australia
        Jennifer Cutler Kellogg School of Management, Northwestern University, Evanston, IL, USA
        Stefan W. Czarniecki HIFU CLINIC Prostate Cancer Center, Warsaw, Poland
        Tomasz Czuba Department of Clinical Sciences, Lund University, Lund, Sweden
        Jonas D’Andrea Deparment of Mathematics, Westminster College, Salt Lake City, UT, USA
        Davide Dal Cason Research Center for Work and Consumer Psychology, ULB, Brussels, Belgium
        Ariella Dale Colorado Department of Public Health and Environment, Denver, CO, USA
        Per Damkier Department of Clinical Research, University of Southern Denmark, Odense, Denmark
        Chitrang Dani Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR), Neurocsience Unit, Jakkur, Bengaluru, India
        Sameera Daniels Ramsey Decision Theoretics, Washington, DC, USA
        Nairanjana Dasgupta Department of Mathematics and Statistics, Washington State University, Pullman, WA, USA
        Christoph Daube Institute of Neurscience and Psychology, University of Glasgow, Glasgow, Scotland, UK
        Frank Davenport Climate Hazards Center, Department of Geography, UC Santa Barbara, Santa Barbara, CA, USA
        George Davey Smith University of Bristol, Bristol Medical School, Oakfield House, Oakfield Grove, Bristol, UK
        Doug Davidson Basque Center on Cognition, Brain and Language, Donostia-San Sebastian, Spain
        Michiel R. de Boer Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
        Sara De Matteis National Heart & Lung Institute, Imperial College London, London, UK
        Thomas P. A. Debray Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
        Filip Děchtěrenko Institute of Psychology, Czech Academy of Sciences, Prague, Czech Republic
        Johan Decruyenaere Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
        Paddy C. Dempsey MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge, UK
        William Denault Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
        Stewart Denslow College of Charleston, Charleston, SC, USA
        Ashar Dhana Division of Dermatology, Groote Schuur Hospital, Cape Town, South Africa
        Subhra Sankar Dhar Department of Mathematics and Statistics, Indian Institute of Technology Kanpur (IIT), Kanpur, India
        Giorgio Maria Di Nunzio Department of Information Engineering, University of Padua, Padua, Italy
        Sofia Dias Centre for Reviews and Dissemination, University of York, York, UK
        Martin Dietz Center of Functionally Integrative Neuroscience, Institute of Clinical Medicine, Aarhus University, Aarhus, Denmark
        Stephan Dilchert Zicklin School of Business, Baruch College, City University of New York, New York, NY, USA
        Evanthia Dimara Sorbonne University, Paris, Ile de France, France
        Igor Dolgov Department of Psychology, New Mexico State University, Las Cruces, NM, USA
        Frias-Navarro Dolores Faculty of Psychology, Department of Methodology of the Behavioural Sciences, University of Valencia, Valencia, Spain
        Peter Dorman Political Economy, Evergreen State College, Olympia, WA, USA
        Pierre Dragicevic National Institute for Research in Computer Science and Automation (INRIA), Saclay, FranceJoel A. Dubin Department of Statistics and Actuarial Science, and School of Public Health and Health Systems, University of Waterloo, Waterloo, ON, Canada
        Carole Dufouil Bordeaux School of Public Health, Inserm “Bordeaux Population Health Center”, University of Bordeaux, Bordeaux, France
        Richard P. Duncan Institue for Applied Ecology, University of Canberra, ACT, Canberra, Australia
        Daniel J. Dunleavy College of Social Work, Florida State University, Tallahassee, FL, USA
        William D. Dupont Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
        Bari Dzomba Temple University, Philadelphia, PA, USA
        Paul W. Eastwick Department of Psychology, University of California – Davis, Davis, CA, USA
        Peter Adriaan Edelsbrunner ETH Zürich, Zürich, Switzerland
        Erika M. Edwards Department of Mathematics and Statistics, University of Vermont, Burlington, VT, USA
        Orestis Efthimiou Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
        Vera Ehrenstein Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark
        Linda Ejlskov National Center of Register-based Research, Aarhus University, Denmark
        Vanessa El Kamari Department of Infectious Diseases, Case Western Reserve University school of Medicine, Cleveland, Ohio, USA
        Kenneth J. Elgersma Department of Biology, University of Northern Iowa, Cedar Falls, IA, USA
        Denis-Alexander Engemann National Institute for Research in Computer Science and Automation (INRIA), Paris, France
        Arturo Erdely Facultad de Estudios Superiores Acatlan, Universidad Nacional Autonoma de Mexico, Naucalpan, Mexico
        Erik Barry Erhardt University of New Mexico, Albuquerque, NM, USA
        Thomas Fabbro Department of Clinical Research, University of Basel, Basel, Switzerland
        Anna Faino Seattle Children’s Research Institute, Seattle, Washington, USA
        Lee Jason Falin Brigham Young University – Idaho, Rexburg, ID, USA
        Jonathan Falk Marginal Utility LLC, Rye, NY USA (retired economist)
        Eddy Fan Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada
        Andrés Fandiño-Losada School of Public Health, Faculty of School, Universidad del Valle, Cali, Colombia
        Tracey Farragher Integrated Interdisciplinary Innovations in Healthcare Science (i3HS) Hub, University of Manchester, Manchester, UK
        Oliver Faude Department of Sport, Exercise and Health, University of Basel, Switzerland
        Jonathan Fawcett Department of Psychology, Memorial University of Newfoundland, St. John’s, NL, Canada
        Ernst Fehr Department of Economics, University of Zurich, Zurich, Switzerland
        Fred M. Feinberg University of Michigan, Ann Arbor, MI, USA
        Morten Holm Jacobsen Fenger Department of Economics and Business Economics, Aarhus University, Denmark
        Ricardo M. Fernandes Clinical Pharmacology and Therapeutics, Faculty of Medicine, University of Lisbon, Portugal
        Eduardo Fernandez-Duque Department of Anthropology and School of Forestry and Environmental Studies, Yale University, New Haven, CT, USA
        David Ferreira Département d’Anesthésie-Réanimation Chirurgicale, CHRU Jean Minjoz, Besançon, France
        Mason Fidino Conservation & Science, Lincoln Park Zoo, Chicago, IL, USA
        Fiona Fidler School of Historical and Philosophical Studies & School of BioSciences, University of Melbourne, Melbourne, Australia
        Katherine L. Fielding London School of Hygiene & Tropical Medicine, London, UK
        Katarzyna Filimonow Institute of Genetics and Animal Breeding PAS, Jastrzębiec, Poland
        Sarah Filippi Faculty of Medicine, School of Public Health, Imperial Collage London, London, UK
        Tommaso Filippini Department of Biomedical, Metabolic and Neural Sciences, University of Modena and Reggio Emilia, Modena, Italy
        Charles Fisher Unlearn.AI, Inc., San Francisco, CA, USA
        Ane B. Fisker Bandim Health Project, Statens Serum Intitut, Copenhagen, Denmark
        Aaron Fleishman Department of Surgery, Beth Israel Deaconess Medical Center, Boston, MA, USA
        Luisa Foco Institute for Biomedicine, Eurac Research, Bolzano, Italy
        James Foley Department of Zoology, University of Oxford, Oxford, UK
        Leonardo Ferreira Fontenelle Universidade Vila Velha, Vila Velha, Espírito Santo, Brazil
        Randi Foraker School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
        Alan J. Fossa Division of General Medicine, Beth Israel Deaconess Medical Center, Boston MA, USA
        Jean-Louis Foulley Institut Montpelliérain Alexander Grothendieck (IMAG), Université de Montpellier, Montpellier, France
        Spencer J. Fox Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
        Hans T.W. Frankort Cass Business School, City, University of London, London, UK
        Roger Frantz Department of Economics and Department of Psychology, San Diego State Universtiy, San Diego, CA, USA
        Matteo Fraschini Department of Electrical and Electronic Engineering, University of Cagliari, Cagliari, Italy
        Ronald D. Fricker, Jr. Virginia Tech, Blacksburg, Virginia, VA, USA
        Eiko I. Fried Department of Clinical Psychology, Leiden University, Leiden, The Netherlands
        David Funder Department of Psychology, University of California, Riverside, CA, USA
        Olivier Gaget Clinical Investigation Center, Montpellier University Hospital, Montpellier, France
        David Gal College of Business Administration, University of Illinois at Chicago, Chicago, IL, USAManoel Galdino Trasparência Brasil, São Paulo, São Paulo, Brazil
        Sandro Galea Boston University School of Public Health, Boston, MA, USA
        Jason R. Gantenberg Department of Epidemiology, Brown University School of Public Health, Providence, RI, USA
        Emili García-Berthou Institue of Aquatic Ecology, University of Girona, Girona, Spain
        Eduardo Garcia-Garzon Facultad de Psicología, Universidad Autónoma de Madrid, Madrid, Spain
        Marc Gastonguay Metrum Research Group, Tariffville, CT, USA
        Simon Gates Cancer Research UK Clinical Trials Unit, University of Birmingham, Birmingham, UK
        Remi Gau Universite Catholique de Louvain, Louvain la Neuve, Belgium
        Andrew Gelman Department of Statistics and Department of Political Science, Columbia University, New York, NY, USA
        Richard C. Gerkin Arizona State University, School of Life Sciences, Tempe, AZ, USA
        Angela Gialamas School of Public Health, University of Adelaide, Adelaide, Australia
        Camila Gianella Department of Psychology, Chr. Michelsen Institute, Norway; Pontificia Universidad Católica del Perú, San Miguel, Peru
        Gerd Gigerenzer Max Planck Institute for Human Development, Berlin, Germany
        Dustin Gilbreath Caucasus Research Resource Centers (CRRC), Tbilisi, Georgia
        Antje Girndt Max Planck Institute for Ornithology, Seewiesen, Germany
        Hirofumi Go Department of Medical Statistics, Osaka City University, Osaka, Japan
        Joachim Goedhart Section Molecular Cytology, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
        Megan Goldring Department of Psychology, Columbia University, New York, NY, USA
        Klara Goldstein Faculty of Biology, University of Warsaw, Biological & Chemical Research Center, Warsaw, Poland
        Juana Gómez-Benito Quantitative Psychology Unit, Faculty of Psychology, University of Barcelona, Barcelona, Spain
        Carlos E. Gonçalves Faculty of Sport Sciences, University of Coimbra, Coimbra, Portugal
        Katerina Gonzalez Zicklin School of Business, Baruch College, City University of New York, New York, NY, USA
        Nathan Goodman Lake Forest Park, WA, USA (retired computer scientist)
        Lucas Goossens Erasmus School for Health Policy & Management, Erasmus University, Rotterdam, The Netherlands
        Chandrasekar Gopalakrishnan Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
        Atsushi Goto Epidemiology and Prevention Group, Center for Public Health Sciences, National Cancer Center, Tokyo, Japan
        Marc-André Goulet School of Psychology, University of Ottawa, Canada
        Abraham Edgar Gracia-Ramos Local Health Research Committee, National Medical Center “La Raza”, IMSS, Mexico City, Mexico
        Jaimie L. Gradus Boston University School of Public Health, Boston, MA, USA
        Matthew J. Grainger Norwegian Institute for Nature Research, Trondheim, Norway
        Janet Grant School of Public Health, University of Adelaide, Adelaide, South Australia, Australia
        Edwin J. Green Department of Ecology, Evolution & Natural Resources, Rutgers University, New Brunswick, NJ, USA
        Donald P. Green Department of Political Science, Columbia University, New York, NY, USA
        Nathan Green Department of Infectious Disease Epidemiology, Imperial College London, UK
        Glenn M. Greenwald Consulting Wildlife & Fisheries Ecologist, US Fish & Wildlife Service, Palm Coast, FL, USA (retired biologist)
        Robert A. Greevy, Jr. Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
        Marian Grendar Biomedical Center Martin, Jessenius Faculty of Medicine, Comenius University, Slovakia
        James W. Grice Department of Psychology, Oklahoma State University, Stillwater, OK, USA
        Ulrich Grüninger Institute for Social and Preventive Medicine, University of Bern, Bern Switzerland
        Jérôme Guélat Swiss Ornithological Institute, Sempach, Switzerland
        Martin Eduardo Guerrero-Gimenez Laboratorio de Oncología, Instituto de Medicina y Biología Experimental de Cuyo CONICET – Mendoza, Mendoza, Argentina
        Shion Guha Department of Computer Science, Marquette University, Milwaukee, WI, USA
        Martin Gulliford School of Population Health and Environmental Sciences, King’s College, London, UK
        Paul Gustafson Department of Statistics, University of British Columbia, Vancouver, Canada
        József Gyurácz Department of Biology, Savaria Campus, Eötvös Lorand University, Szombathely, Hungary
        Dandara Haag School of Public Health, University of Adelaide, Adelaide, Australia
        Noah Haber Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
        Kristen A. Hahn IQVIA, Durham, NC, USA
        Brian D. Haig Department of Psychology, University of Canterbury, Christchurch, New Zealand
        Florian S. Halbeisen Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
        Brian J. Hall Department of Psychology, Global and Community Mental Health Research Group, University of Macau, Macao (SAR), China
        Owen S. Hamel Northwest Fisheries Science Center, NOAA, Seattle, WA, USA
        Sandra Hamel Uit The Arctic University of Norway, Tromsø, Norway
        Geoff Hammond School of Psychological Science, University of Western Australia, Perth, Australia
        Johnni Hansen Danish Cancer Society Research Center, Copenhagen, Denmark
        Karsten Theil Hansen Rady School of Management, University of California, San Diego, La Jolla, CA, USASteve Haroz National Institute for Research in Computer Science and Automation (INRIA), Saclay, France
        Sam Harper Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
        Frank Harrell Department of Biostatistics, Vanderbilt University, School of Medicine, Nashville, TN, USA
        Ewen M. Harrison Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, UK
        Wendy J. Harrison Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
        Tom Hartley Department of Psychology, University of York, York, UK
        Theresa Hastert Department of Oncology, Wayne State University School of Medicine, Detroit, MI, USA
        Elizabeth E. Hatch Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
        Julia E. Heck Department of Epidemiology, University of California – Los Angeles, Los Angeles, CA, USA
        Thomas Heckelei University of Bonn, Germany
        Uffe Heide-Jørgensen Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark
        Georg Heinze Medical University of Vienna, Vienna, Austria
        Daniel P. Henriksen Clinical Pharmacology and Pharmacy, Department of Public Health, University of Southern Denmark, Funen, Denmark
        Miguel Hernan Harvard T.H. Chan School of Public Health, Boston, MA, USA
        Megan Higgs Montana State University, Bozeman, MT, USA
        Jennifer Hill PRIISM Center, Steinhardt School of Culture, Education and Human Development, New York University, New York, NY, USA
        John M. Hinson Department of Psychology, Washington State University, Pullman, WA, USA
        Norbert Hirschauer Agribusiness Management, Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Halle, Germany
        Lan T. Ho-Pham Bone and Muscle Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam
        Daniel Hoffmann Bioinformatics and Computational Biophysics, Faculty of Biology, University of Duisburg-Essen, Essen, Germany
        William Gary Hopkins Victoria University, Melbourne, Australia
        Aidan J. Horner Department of Psychology, University of York, York, UK
        Sam Horwich-Scholefield California Department of Public Health, Richmond, CA, USA
        Daniel J. Hruschka School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA
        Jonathan Y. Huang Singapore Institute for Clinical Sciences, Singapore
        Raymond Hubbard College of Business and Public Administration, Drake University, Des Moines, IA, USA
        Tania B. Huedo-Medina Allied Health Sciences Department and Statistics Department, University of Connecticut, CT, USA
        Conor Hughes Department of Applied Economics, University of Minnesota, St. Paul, MN, USA
        Anders Huitfeldt Independent researcher, Oslo, Norway
        Patria A. Hume Auckland University of Technology, Sport Performance Research Institute New Zealand, Auckland, New Zealand
        Stuart H. Hurlbert Center for Inland Waters, San Diego State University, San Diego, CA, USA
        Krista F. Huybrechts Division of Pharmacoepidemiology, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
        Nam Nhat Cong Huynh Department of Immunology, University of Tokyo, Tokyo, Japan
        Ulla Arthur Hvidtfeldt Danish Cancer Society Research Center, Copenhagen, Denmark
        Amiyaal Ilany Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel
        Franco Milko Impellizzeri Human Performance Research Laboratory, Faculty of Health, University of Technology Sydney, Sydney, Australia
        Robin A. A. Ince Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
        Denis Infanger Department of Sports, Movement and Health, University of Basel, Basel, Switzerland
        Armin Ionescu-Hirsch The Steinhardt Museum of Natural History, Tel Aviv University, Tel Aviv, Israel
        Ewa Jabłońska Faculty of Biology, University of Warsaw, Warsaw, Poland
        Georg Jahn Department of Psychology, Chemnitz University of Technology, Chemnitz, Germany
        Yvonne Jansen Centre National de la Recherche Scientifique (CNRS), Sorbonne Université, Paris, France
        Armina Janyan Research Center for Cognitive Science, New Bulgarian University, Sofia, Bulgaria
        Greg Jensen Department of Psychology, Columbia University, New York, NY, USA
        Thomas Bo Jensen Department of Clinical Pharmacology, Copenhagen University Hospital – Bispebjerg, Copenhagen, Denmark
        Minjeong Jeon Social Research Methodology, Graduate School of Education & Information Studies, University of California – Los Angeles, Los Angeles, CA, USA
        Stefan Johansson Clinical Epidemiology Unit, Deptartment of Clinical Science, Karolinska Institutet, Stockholm, Sweden
        Kate M. Johnson Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada
        Michael Johnson Dept. of Management and Organization, Foster School of Business, University of Washington, Seattle, WA, USA
        Paul Johnson WildCRU, Recanati Kaplan Centre, Zoology, University of Oxford, Oxford, UK
        Luke W. Johnston Department of Public Health, Aarhus University, Aarhus, Denmark
        Phillip Jolly School of Hospitality Management, Pennsylvania State University, University Park, PA, USA
        Pascal Jordan Department of Psychology, University of Hamburg, Hamburg, Germany
        Ants Kaasik Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
        Conrad Kabali Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
        Bryar Kadir Cancer Research UK Clinical Trials Unit, University of Birmingham, Birmingham, UKGwenaël Kaminski Université de Toulouse and Centre National de la Recherche Scientifique (CNRS), Toulouse, France
        Patrick C. Kaminski Indiana University Bloomington, Department of Sociology, Center for Complex Networks and Systems Research, Bloomington, IN, USA
        André Karch Institute for Epidemiology and Social Medicine, University of Münster, Münster, Germany
        Dirk Nikolaus Karger Swiss Federal Research Institute WSL, Birmensdorf, Switzerland
        Ali Karimnezhad Department of Biochemistry, Microbiology, and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
        Srinivasa Vittal Katikireddi MRC/CSO Social & Public Health Sciences Unit, University of Glasgow, Glasgow, Scotland, UK
        Jay S. Kaufman Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
        James A. Kaye RTI Health Solutions, Waltham, MA, USA
        Alexander Keil Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
        Aayush Khadka Harvard Graduate School of Arts and Sciences, Global Health and Population Department, Cambridge, MA, USA
        Quynh Long Khuong Center for Population Health Sciences, Hanoi University of Public Health, Hanoi, Vietnam
        Ali Kiadaliri Clinical Epidemiology Unit, Department of Clinical Sciences, Lund University, Lund, Sweden
        Henk Kiers Department of Psychology, University of Groningen, Groningen, The Netherlands
        Dae Kim Hebrew SeniorLife, Boston, MA, USA
        Min-Hyung Kim Harvard T.H. Chan School of Public Health, Boston, MA, USA
        Seoyoung Kim Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
        Peter Klaren Department of Animal Ecology & Physiology, Institute for Water and Wetland Research, Radboud University, Nijmegen, The Netherlands
        E. David Klonsky Department of Psychology, University of British Columbia, Vancouver, BC, Canada
        John L. Kmetz University of Delaware, Newark, DE, USA
        Emma Knight School of Public Health, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
        Sven Knüppel Department of Nutrition and Gerontology (ERG), German Institute of Human Nutrition Potsdam-Rehbrücke (DIfE), Nuthetal, Germany
        Maximilian Köppel Department of Prevention & Rehabilitation, Institute for Sport and Sportscience, University of Heidelberg, Germany
        Konrad Paul Kording Departments of Neuroscience and Biomedical Engineering, University of Pennsylvania, Philadelphia, PA, USA
        Fränzi Korner-Nievergelt oikostat GmbH, Ettiswil, Switzerland
        Koji E. Kosugi School of Human Sciences, Senshu University, Tokyo, Japan
        Wiktor Kotowski Faculty of Biology, University of Warsaw, Poland
        Johan Kotze Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
        Gilles Kratzer Department of Mathematics, University of Zürich, Zürich, Switzerland
        Nate Kratzer Brown-Forman Data Science, Louisville, KY, USA
        Jacob K. Kresovich Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
        Andreas Kreutzer Department of Kinesiology, Texas Christian University, Fort Worth, TX, USA
        Jesse Krijthe Institute for Computing and Information Sciences, Radboud University, Nijmegen, The Netherlands
        Lars E. Kroll Zentralinstitut für die Kassenärztliche Versorgung, Berlin, Germany
        Kasper Krommes Sports Orthopedic Research Center, Copenhagen, Copenhagen University Hospital, Copenhagen, Denmark
        Robert Kubinec Woodrow Wilson School, Princeton University, Princeton, NJ, USA
        Claudia E. Kuehni Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
        Oliver Kuss Institute for Biometrics and Epidemiology, German Diabetes Center, Düsseldorf, Germany
        Ilse B. Labuschagne School of Audiology and Speech Sciences, University of British Columbia , Vancouver , BC, Canada
        Martin Lachmair Leibniz-Institut fuer Wissensmedien, Tuebingen, Germany
        Mark H. C. Lai Department of Psychology, University of Southern California, Los Angeles, CA, USA
        Jessica E. Laine Department of Epidemiology and Biostatistics, Imperial College, London, UK
        Daniel L. Lakeland Lakeland Applied Sciences LLC, Altadena, CA, USA
        Iasonas Lamprianou Department of Social and Political Sciences, University of Cyprus, Nicosia, Cyprus
        Markus Landolt University Children’s Hospital Zürich, Zürich, Switzerland
        Jonas W. B. Lang Department of Personnel Management, Work and Organizational Psychology, Ghent University, Ghent, Belgium
        Junpeng Lao Département de Psychologie, Université de Fribourg, Fribourg, Switzerland
        Osvaldo Lara-Sarabia Neurology Unit, Clínica de la Costa and Department of Clinical Epidemiology, Universidad del Norte, Barranquilla, Colombia
        Timothy L. Lash Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GE, USA
        Deborah A. Lawlor MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
        Aaron Lawson McLean Department of Neurosurgery, Jena University Hospital, Jena, Germany
        Nina Lazarevic School of Public Health, Faculty of Medicine, The University of Queensland, Brisbane, Australia
        Nam Le Quang The Key Laboratory of Animal Cell Technology, National Institute of Animal Sciences, Hanoi, Vietnam
        Nhat Nam Le Dong R&D Team, Medisoft MGCd, Sorinnes, Belgium
        David J. Lederer Columbia University Irving Medical Center, New York, NY, USA
        Ruben D. Ledesma IPSIBAT, Consejo Nacional de Investigaciones y Técnicas, Universidad Nacional de Mar del Plata, Buenos Aires, Argentina
        Alison Ledgerwood Department of Psychology, University of California – Davis, Davis, CA, USAJohn D. Lee Department of Industrial and Systems Engineering, University of Wisconsin – Madison, Madison, WI, USA
        Dale Lehman Center for Business Analytics, Loras College, Dubuque, IA, USA
        Luigi Leone University of Rome “Sapienza”, Department of Social and Developmental Psychology, Rome, Italy
        Richard A. Levine Department of Mathematics and Statistics, San Diego State University, San Diego, CA, USA
        Drew Griffin Levy GoodScience, Inc. Mountain View, CA, USA
        Li Li Georgia Southern University, Statesboro, GA, USA
        Thoralf Randolph Liebs Department of Pediatric Surgery, Inselspital, University of Bern, Bern, Switzerland
        Roberto Limongi Robarts Research Institute, University of Western Ontario, CA, USA
        Winston Lin Department of Statistics and Data Science, Yale University, New Haven, CT, USA
        Jonas Kristoffer Lindeløv Center for Cognitive Neuroscience, Department of Communication and Psychology, Aalborg University, Denmark
        Emma Link Centre for Biostatistics and Clinical Trials, Peter MacCallum Cancer Centre, Melbourne, Australia
        Simeon Lisovski Swiss Ornithological Institute, Sempach, Switzerland
        Marco Tullio Department of Medical and Surgical Sciences, “Magna Graecia” University of Catanzaro, Catanzaro, Italy
        Melvin Livingston Department of Behavioral Sciences and Health Education, Rollins School of Public Health, Emory University, Atlanta, GA, USA
        D. E. Huw Llewelyn Department of Mathematics, Aberystwyth University, Aberystwyth, UK
        Joseph J. Locascio Harvard Medical School, Massachusetts General Hospital, Boston, MA, USA
        Yiska Loewenberg Weisband Braun School of Public Health, The Hebrew University, Jerusalem, Israel
        Eric T. Lofgren Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, USA
        Rosaria Lombardo Department of Economics, University of Campania “Luigi Vanvitelli”, Capua, Italy
        Richard B. Lopez Department of Psychological Sciences, Rice University, Houston, TX, USA
        Sarah J. Lord Epidemiology and Medical Statistics, School of Medicine, University of Notre Dame, Sydney, NSW, Australia
        Lilla Lovász Zoological Institute, University of Basel, Basel, Switzerland
        Jessica Love Northwestern University, Evanston, IL, USA
        Jonathan Lu Department of Computer Science, Princeton University, Princeton, NJ, USA
        Daniel Lüdecke Department of Medical Sociology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
        Brad Luen Department of Statistics, Indiana University, Bloomington, IN, USA
        Stuart Luppescu UChicago Consortium on School Research, University of Chicago, Chicago, IL, USA
        Courtney D. Lynch College of Medicine, The Ohio State University, Columbus, OH, USA
        Jay Lynch Pearson Research, University of Colorado, Denver, CO, USA
        John W. Lynch School of Public Health, University of Adelaide, Australia
        Alice Jessie Clark Lyth Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
        Christopher R. Madan School of Psychology, University of Nottingham, Nottingham, UK
        Sreenath Madathil Faculty of Dentistry, McGill University, Montreal, QC, Canada
        Eslam Maher Department of Clinical Research, Children’s Cancer Hospital Egypt, Cairo, Egypt
        Evan Majic Department of Economics, Princeton University, Princeton, NJ, USA
        Thomas Mani Department of Environmental Science, University of Basel, Basel, Switzerland
        Thea K. Mannix Center for Cognitive Neuroscience, Department of Communication and Psychology, Aalborg University, Aalborg, Denmark
        Mohammad Ali Mansournia Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
        Raoul Mansukhani Clinical Trials Unit, London School of Hygiene & Tropical Medicine, London, UK
        David J. Marcus InterSystems Corporation, Somerville, MA, USA
        Anne Margarian Thuenen-Institute for Rural Studies, Braunschweig, Germany
        Andrea V. Margulis Research Triangle Institute, Research Triangle Park, NC, USA
        Gabriele Mari Department of Public Administration and Sociology, Erasmus University Rotterdam, Rotterdam, The Netherlands
        Jean-Michel Marin Institut Montpelliérain Alexander Grothendieck (IMAG), Université de Montpellier, CNRS, Montpellier, France
        Daniele Marinazzo Department of Data Analysis, Faculty of Psychological and Educational Sciences, Ghent Universtiy, Ghent, Belgium
        Michael Marks Department of Psychology, New Mexico State University, Las Cruces, NM, USA
        Fernando Marmolejo-Ramos School of Psychology, University of Adelaide, Adelaide, Australia
        Laura Martignon Department of Mathematics and Computer Science, University of Education, Ludwigsburg, Germany
        Osvaldo Antonio Martin Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Instituto de matemática aplicada San Luis, San Luis, Argentina
        Stephen R. Martin Department of Psychology, University of California – Davis, Davis, CA, USA
        Gonzalo Martínez-Alés Department of Epidemiology, Columbia University School of Public Health, New York, NY, USA
        Ana Martinovici Rotterdam School of Management, Erasmus University, Rotterdam, The Netherlands
        Ben Marwick Department of Anthropology, University of Washington, Seattle, WA, USA
        Piotr Mariusz Maszczyk Department of Hydrobiology, University of Warsaw, Warsaw, Poland
        Maya Mathur Department of Epidemiology, Harvard University, Boston, MA, USA
        Norman Matloff Department of Computer Science, University of Califonia – Davis, Davis, CA, USARobert A. J. Matthews Department of Mathematics, Aston University, Birmingham, UK
        Nicholas J. Matzke School of Biological Sciences, University of Auckland, Auckland, New Zealand
        Dimitris Mavridis Department of Primary Education, University of Ioannina, Ioannina, Greece
        Raffaele Mazzolari Department of Physical Education and Sport, University of the Basque Country (UPV/EHU), Vitoria-Gasteiz, Spain
        Lawrence McCandless Faculty of Health Sciences, Simon Fraser University, Vancouver, BC, Canada
        Wyatt J. McDonnell Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
        Ryan J. McGill William & Mary School of Education, Williamsburg, VA, USA
        Brian McKay Department of Epidemiology and Biostatistics, University of Georgia, Athens, GA, USA
        Jonathon McPhetres Clinical and Social Sciences in Psychology, University of Rochester, Rochester, NY, USA
        Roberto Melotti Institute for Biomedicine, Eurac Research, Bolzano, Italy
        Dieter Menne Menne Biomed Consulting, Tübingen, Germany
        Dan Mennill Department of Biological Sciences, University of Windsor, Windsor, ON, Canada
        Brittany K. Mercado Love School of Business, Elon University, Elon, NC, USA
        Lotte Meteyard School of Psychology & Clinical Language Sciences, University of Reading, Reading, UK
        Nicolas Meyer Laboratoire de Biostatistique, Faculté de médecine de Strasbourg, Strasbourg, France
        George Michaelides Norwich Business School, University of East Anglia, Norwich, UK
        Martin Christian Michel Department of Pharmacology, Johannes Gutenberg University, Mainz, Germany
        Matthew W. Miller Auburn University, Auburn, AL, USA
        William C. Miller Division of Epidemiology, The Ohio State University, Columbus, OH, USA
        Andrew J. Milne The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, Australia
        Hiroaki Minato Data scientist, Washignton, DC, USA
        J. Jaime Miranda CRONICAS Center of Excellence in Chronic Diseases, Universidad Peruana Cayetano Heredia, Lima, Peru
        Hitesh Mistry Division of Pharmacy, Cancer Sciences, University of Manchester, Manchester, UK
        Murthy N. Mittinty School of Public Health, University of Adelaide, Adelaide, Australia
        Judith Moeller Amsterdam School of Communication Research, University of Amsterdam, Amsterdam, The Netherlands
        Giusi Moffa Institute for Clinical Epidemiology and Biostatistics, University of Basel, Basel, Switzerland
        Graeme Moffat Interaxon Inc.; Munk School of Global Affairs and Public Policy, University of Toronto, Toronto, ON, Canada
        Alicia Montgomerie School of Public Health, University of Adelaide, Adelaide, Australia
        Valter Moreno Rio de Janeiro State University (UERJ), Rio de Janeiro, Brazil
        Craig Morgan King’s College London, London, UK
        Laust Hvas Mortensen Department of Public Health, University of Copenhagen, Copenhagen, Denmark
        Harvey Motulsky GraphPad Software, Los Angeles, CA, USA
        Harry Moultrie University of the Witwatersrand, Johannesburg, South Africa
        Shabnam Mousavi Max Planck Institute for Human Development, Berlin, Germany
        Pavlos Msaouel MD Anderson Cancer Center, The University of Texas, Houston, TX, USA
        Shubhabrata Mukherjee Department of Medicine, University of Washington, Seattle, WA, USA
        Gerben Mulder Department of Language, Literature, and Communication, Faculty of Humanities, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
        John Mullahy Department of Population Health Sciences, University of Wisconsin-Madison, Madison, WI, USA
        Damian R. Murray Department of Psychology, Tulane University, New Orleans, LA, USA
        Faisal Mushtaq School of Psychology, University of Leeds, Leeds, UK
        Oliver Mußhoff Farm Management Group, Department of Agricultural Economics and Rural Development, Georg-August-Universität Göttingen, Göttingen, Germany
        Daniel Myall New Zealand Brain Research Institute, Christchurch, New Zealand
        Chisato Nagai Department of Biology, Nagoya University, Nagoya, Japan
        Mehdi Najafzadeh Divison of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
        Ladislas Nalborczyk Univiversité Grenoble Alpes, CNRS, LPNC, Grenoble, France
        Manjari Narayan School of Medicine, Stanford University, Palo Alto, CA, USA
        Stephen Nash London School of Hygiene & Tropical Medicine, London, UK
        Khalidha Nasiri Department of Epidemiology and Biostatistics, McGill University, Montreal, QC, Canada
        Guy Nason School of Mathematics, University of Bristol, Bristol, UK
        Jeffrey Negrea Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
        Leslie New Department of Mathematics and Statistics, Washington State University, Vancouver, WA, USA
        Ian R. Newby-Clark Department of Psychology, University of Guelph, Guelph, ON, Canada
        Hao Nguyen Si Anh Institute of Preventive Medicine and Public Health, Hanoi Medical University, Hanoi, Vietnam
        Hoang Hiep Nguyen Department of Pediatrics, Rush Medical College, Chicago, IL, USA
        Nguyen Dinh Nguyen First Care Medical Centre, Bradbury, Australia
        Van Truong Nguyen University of Transport and Communication, Hanoi, Vietnam; Japan Transport and Tourism Research Institute, Tokyo, JapanPhong Thanh Nguyen Department of Project Management, Ho Chi Minh City Open University (HCMCOU), Vietnam
        Tan-Trung Nguyen Institut Jean-Pierre Bourgin, INRA Centre de Versailles-Grignon, Versailles, France
        Tri-Long Nguyen Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
        Tuan V. Nguyen Garvan Institute of Medical Research, Sydney; School of Biomedical Engineering, University of Technology Sydney, Sydney, Australia
        Van Hoan Nguyen Department of Infectious Diseases, Hai Phong Medical and Pharmacy University, Hai Phong, Vietnam
        Hiep Nguyen Canh Department of Human Pathology, Kanazawa University Graduate School of Medicine, Kanazawa, Japan
        Stefan Nickels Department of Ophthalmology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
        Frank Niemeyer Institute of Orthopaedic Research and Biomechanics, University Hospital Ulm, Ulm, Germany
        Adriani Nikolakopoulou Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
        Mette Nørgaard Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark
        Daniel Nuño Signal Theory and Communications Department, Universitat Politecnica de Catalunya (UPC), Barcelona, Spain
        Emily C. O’Brien Duke University School of Medicine and Duke Clinical Research Institute, Durham, NC, USA
        Annette M O’Connor Department of Veterinary Diagnostic and Production Animal Medicine, College of Veterinary Medicine, Iowa State University, Ames, IA, USA
        Brendan O’Connor Department of Neuroscience, Psychology and Behaviour, University of Leicester, Leicester, UK
        Keith O’Rourke O’Rourke Consulting, Ottawa, ON, Canada
        Jonas Obleser Department of Psychology, University of Lübeck, Lübeck, Germany
        Andrew O. Odegaard Department of Epidemiology, University of California – Irvine, Irvine, CA, USA
        Anobel Y. Odisho Department of Urology, School of Medicine, University of California – San Francisco, San Francisco, CA, USA
        Tabatha Offutt-Powell Epidemiology, Health Data, and Informatics Section, Delaware Division of Public Health, Dover, DE, USA
        Elizabeth L. Ogburn Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
        In-Sue Oh Department of Human Resource Management, Fox School of Business, Temple University, Philadelphia, PA, USA
        Rohit P. Ojha Center for Outcomes Research, JPS Health Network, Fort Worth, TX, USA
        Jake Olivier School of Mathematics and Statistics, University of New South Wales, Sydney, Australia
        Per Olsson Gisleskog POG Pharmacometrics, London, UK
        Deniz S. Ones Department of Psychology, University of Minnesota, MN, USA
        Raydonal Ospina Department of Statistics, CAST Laboratory, Universidade Federal de Pernambuco, Recife, Brazil
        Youssef Oulhote Department of Biostatistics and Epidemiology, School of Public Health and Health Scineces, University of Massachusetts Amherst, Amherst, MA, USA
        Thure Filskov Overvad Aalborg Thrombosis Research Unit, Department of Cardiology, Aalborg University Hospital, Aalborg, Denmark
        Philipp G. Packmohr PPDS Data Science Consulting, Weilersbach, Germany
        Matthew J. Page School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
        Adam Palayew Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
        Lyle J. Palmer School of Public Health, University of Adelaide, Adelaide, Australia
        Orestis Panagiotou Department of Health Services, Policy & Practice, Brown University School of Public Health, Providence, RI, USA
        Mauro Panigada Fondazione IRCCS Ca’ Granda, Ospedale Maggiore Policlinico, Milano, Italy
        Dominik Papies School of Business and Economics, University of Tuebingen, Tuebingen, Germany
        Francisco J. Parada Laboratorio de Neurociencia Cognitiva y Social, Universidad Diego Portales, Santiago, Chile
        Tae Youn Park Owen Graduate School of Management, Vanderbilt University, Nashville, TN, USA
        Tim Parker Biology Department, Whitman College, Walla Walla, WA, USA
        Rahul A. Parsa Ivy College of Business, Iowa State University, Iowa State University, Ames, IA, USA
        Juan Pascual-Llobell Facultad Psicologia, Universitat de Valencia, Valencia, Spain
        Marcos Pascual-Soler ESIC Business & Marketing School, Valencia, Spain
        Mehul D. Patel Department of Emergency Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
        Cristian Pattaro Institute for Biomedicine, Eurac Research, Bolzano, Italy
        Francesco Pauli Department of Economics, Business, Mathematics and Statistics, University of Trieste, Italy
        Neil Pearce London School of Hygiene and Tropical Medicine, London, UK
        Carl C. Peck University of California at San Francisco (UCSF), San Francisco, CA, USA
        Asger Pedersen Centre for Studies in Research and Research Policy, Department of Political Science, Aarhus University, Aarhus, Denmark
        Robert K. Peet University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
        Ido Pen University of Groningen, Groningen, The Netherlands
        Marco Peña-Jimenez Laboratory of Psychology, University of Bordeaux, Bordeaux, France
        Pedro Pereira Institute of Biotechnology, University of Helsinki, Helsinki, Finland
        Susana Perez-Gutthann Epidemiology, Barcelona, Spain
        Jose D. Perezgonzalez School of Aviation, Massey Business School, Massey University, Palmerston North, New Zealand
        Charles Perin Department of Computer Science, University of Victoria, Victoria, BC, Canada
        Alex Perusse City Year Inc., Boston, MA, USA
        Oliver L. Pescott NERC Centre for Ecology & Hydrology, Wallingford, Oxfordshire, UKAlberto Pessia Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
        Luiz Pessoa Department of Psychology and Maryland Neuroimaging Center, University of Maryland, College Park, MD, USA
        Raphael S. Peter Institute of Epidemiology and Medical Biometry, Ulm University, Ulm, Germany
        Irene Petersen UCL, Department of Primary Care & Population Health, UK
        Karl J. Petersen Cell Biology and Cancer UMR144, Institut Curie – CNRS, Paris, France
        Kyle Peyton Department of Political Science, Yale University, New Haven, CT, USA
        Peter L. Phalen Division of Psychiatric Services Research, School of Medicine, University of Maryland, Baltimore, MD, USA
        Tom Philippi National Park Service, US Department of the Interior, San Diego, DC, USA
        Rhiannon Pilkington School of Public Health, University of Adelaide, Adelaide, Australia
        Ricardo Pizarro Montreal Neurological Institute, Montreal, QC, Canada
        Robert Platt Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, QC, Canada
        Richard Podkolinski Independent data scientist, The Hague, The Netherlands
        Anna R. Poetsch St. Anna Children’s Cancer Research Institute, Vienna, Austria
        Morgan Poor Department of Marketing, San Diego State University, San Diego, CA, USA
        Anton Pottegård Clinical Pharmacology and Pharmacy, Department of Public Health, University of Southern Denmark, Odense, Denmark
        Patricia Priest Department of Preventive and Social Medicine, University of Otago, Dunedin, New Zealand
        Daniel Prieto-Alhambra Centre for Statistics in Medicine, NDORMS, University of Oxford, Oxford, UK
        Tahira M. Probst Department of Psychology, Washington State University, Vancouver, WA, USA
        Ben Prytherch Department of Statistics, Colorado State University, Fort Collins, CO, USA
        Pierre Pudlo Institut de Mathématiques de Marseille, Aix-Marseille University, Marseille, France
        Kenneth L. Quarrie New Zealand Rugby, Wellington, New Zealand
        Sridharan Raghavan University of Colorado School of Medicine, Department of Medicine, Aurora, CO, USA
        Pamela J. Rakhshan Rouhakhtar Department of Psychology, University of Maryland, Baltimore, MD, USA
        Sean Raleigh Westminster College, Salt Lake City, UT, USA
        Jonas Ranstam Department of Clinical Sciences, Lund University, Lund, Sweden
        Sarah Rathwell Department of Medicine, University of Alberta, Edmonton, AB, Canada
        Peter F. Rebeiro Departments of Biostatistics and of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
        Felix G. Rebitschek Max Planck Institute for Human Development, Berlin, Germany
        Daniel Redhead Department of Human Behavior, Ecology and Culture, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
        Dorota Reis Department of Psychology, Saarland University, Germany
        Aleksi Reito Tampere University Hospital, Tampere, Finland
        Alexandra Restrepo Henao Epidemiology Research Group, National School of Public Health, Medellin, Antioquia, Colombia
        Lorenzo Richiardi Department of Medical Sciences, University of Turin, Italy
        Corinne A. Riddell University of California, Berkeley School of Public Health, Berkeley, CA, USA
        Jennifer R. Rider Boston University School of Public Health, Boston, MA, USA
        Andreas Rieckmann Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
        Eike Mark Rinke School of Politics and International Studies, University of Leeds, Leeds, UK
        Pandu Riono Universitas Indonesia, Faculty of Public Health, Dept. of Biostatistics & Population Studies, Depok, West-Java, Indonesia
        Julien Riou Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
        Robert Ritz LETU Mongolia program, Ider University, Ulaanbaatar, Mongolia
        Doug Robb School of Psychology and Exercise Science, Murdoch University, Murdoch, Australia
        Christian Robert Université Paris Dauphine, Paris, France
        Norbert Röder Thünen-Institute of Rural Studies, Federal Research Institute for Rural Areas, Forestry and Fisheries, Braunschweig, Germany
        José Ángel Rodrigo-Pendás Preventive Medicine and Epidemiology Department, Vall d’Hebron University Hospital, Barcelona, Spain
        Alex Miranda Rodrigues IMEPAC School of Medicine, Araguari, Brazil
        David A. Rodríguez-Medina Faculty of Psychology, National Autonomous University of Mexico, Mexico City, Mexico
        Marco A. Rodríguez Département des sciences de l’environnement, Université du Québec à Trois-Rivières, Trois-Rivières, QC, Canada
        Neal Roese Kellogg School of Management, Northwestern University, Evanston, IL, USA
        Daniel Rojas-Líbano Laboratorio de Neurociencia Cognitiva y Social, Universidad Diego Portales, Santiago, Chile
        Megan E. Romano Department of Epidemiology, Geisel School of Medicine, Lebanon, New Hampshire, USA
        Xavier Romão Faculty of Engineering of the University of Porto, Porto, Portugal
        Jens Rommel Swedish University of Agricultural Sciences, Uppsala, Sweden
        Jason M.T. Roos Rotterdam School of Management, Erasmus University, Rotterdam, The Netherlands
        Les Rose Pharmavision Consulting Ltd, Salisbury, UK (retired clinical research scientist)
        Alejandra Rossi Laboratorio de Neurociencia Cognitiva y Social, Universidad Diego Portales, Santiago, Chile
        Marios Rossides Clinical Epidemiology Division, Department of Medicine Solna, Karolinska Institutet, Stockholm, SwedenTobias Roth Zoological Institute, University of Basel, Basel, Switzerland
        Emily F. Rothman Community Health Sciences Department, Boston University School of Public Health, Boston, MA, USA
        Kenneth J. Rothman Research Triangle Institute, Research Triangle Park, NC; Boston University, Boston, MA, USA
        Hannah Rothstein Zicklin School of Business, Baruch College, City University of New York, New York, NY, USA
        Jonathan Rougier School of Mathematics, University of Bristol, Bristol, UK
        Guillaume A. Rousselet Institute of Neuroscience and Psychology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, Scotland, UK
        Stephen J. Ruberg Independent Researcher, Analytix Thinking LLC, Indianapolis, IN, USA
        Nils-Petter Rudqvist Department of Radiation Oncology, Weill Cornell Medicine, New York, NY, USA
        Ludwig Ruf Institute of Sports and Preventive Medicine, Saarland University, Saarbrücken, Germany
        Susana Ruiz-Fernández FOM-Hochschule für Oekonomie & Management, Essen; Leibniz-Institut für Wissensmedien, Tübingen, Germany
        Elizabeth G. Ryan Cancer Research UK Clinical Trials Unit, University of Birmingham, Birmingham, UK
        Robin J. Ryder CEREMADE, Université Paris-Dauphine, Paris, France
        Georgia Salanti Institute of Social and Preventive Medicine, University of Bern, Switzerland
        Peter Samai Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
        Alfredo Sánchez-Tójar Department of Evolutionary Biology, Bielefeld University, Bielefeld, Germany
        Pedro Henrique Ribeiro Santiago Adelaide Dental School, The University of Adelaide, Adelaide, Australia
        Susanne Sattler National Heart and Lung Institute, Imperial College London, London, UK
        David A. Savitz Department of Epidemiology, Brown University, School of Public Health, Providence, RI, USA
        George M. Savva Quadram Institute Bioscience, Norwich, Norfolk, UK
        Andrew M. Sayer NASA Goddard Space Flight Center Ocean Ecology Laboratory, Universities Space Research Association, Columbia, MD, USA
        Peter Scarth School of Earth and Environmental Sciences, The University of Queensland, Queensland, Australia
        Thomas Schäfer Department of Psychology, Chemnitz University of Technology, Chemnitz, Germany
        Mark E. Schaffer School of Social Sciences, Heriot-Watt University, Edinburgh, Scotland, UK
        Daniel Scharfstein Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
        Clyde B. Schechter Department of Family & Social Medicine, Albert Einstein College of Medicine, Bronx, NY, USA
        Joanna Schellenberg London School of Hygiene & Tropical Medicine, London, UK
        Chris Schmader Office of Assessment, Loyola Marymount University, Los Angeles, CA, USA
        Alexandra M. Schmidt Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
        Frank L. Schmidt Department of Management and Organizations, Tippie College of Business, University of Iowa, Iowa City, IO, USA
        Morten Schmidt Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark
        Tim Schmoll Evolutionary Biology, Bielefeld University, Bielefeld, Germany
        Jesper W. Schneider Centre for Studies in Research and Research Policy, Department of Political Science, Aarhus University, Aarhus, Denmark
        Julia Schroeder Department of Life Sciences, Imperial College London, London, UK
        Helena Silveira Schuch School of Public Health, University of Adelaide, Adelaide, Australia
        George R. Seage III Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
        Ricardo Segurado School of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland
        Katsutoshi Seki Natural Science Laboratory, Toyo University
        Andrea Serafino The Bank of England, London, UK
        Shira Shafir Department of Epidemiolgy, Fielding School of Public Health, University of California – Los Angeles, Los Angeles, CA, USA
        Zach Shahn IBM Research, Yorktown Heights, NY, USA
        Akshay Sharma Pediatric Bone Marrow Transplantation and Cellular Therapy, St. Jude Children’s Research Hospital, Memphis, TN, USA
        Martin Shepperd Department of Computer Science, Brunel University London, Uxbridge, UK
        Jeffrey Sherman Department of Psychology, University of California – Davis, Davis, CA, USA
        Koichiro Shiba Harvard T.H. Chan School of Public Health, Boston, MA, USA
        Riti Shimkhada University of California – Los Angeles, Los Angeles, CA, USA
        Chelsea L. Shover Department of Psychiatry and Behavioral Sciences, Stanford University, Palo Alto, CA, USA
        Mikhail Shubin Department of Computer Science, University of Helsinki, Helsinki, Finland
        Jonne Sikkens Department of Internal Medicine, Amsterdam UMC, Amsterdam, The Netherlands
        Harminder Singh Department of Business Information Systems, Faculty of Business, Economics & Law, Auckland University of Technology, Auckland, New Zealand
        Marlena Siwiak Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
        Nils Skajaa Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark
        Daniel Slade Cancer Research Clinical Trials Unit, Institite of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
        Tom Smekens Department of Public Health, Institute of Tropical Medicine, Antwerp, Belgium
        Richard J. Smith Department of Anthropology, Washington University in St. Louis, St. Louis, MO, USA
        Lisa Smithers School of Public Health, University of Adelaide, Adelaide, Australia
        Tim Smits KU Leuven Institute for Media Studies, Leuven, BelgiumKlára Soltész-Várhelyi Institute of Psychology, Pázmány Péter Catholic University, Budapest, Hungary
        Lukas Sönning English Linguistics Department, University of Bamberg, Germany
        Henrik Toft Sørensen Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark
        Signe Sørup Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus; Bandim Health Project, Statens Serum Intitut, Copenhagen, Denmark
        Michael Spagat Department of Economics, Royal Holloway University of London, London, UK
        Seth M. Spain John Molson School of Business, Concordia University, Montreal, QC, Canada
        Rodney Sparapani Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, USA
        David Spiegelhalter Statistical Laboratory, Centre for Mathematical Sciences, University of Cambridge, Cambridge, UK
        Manuel Spínola International Institute in Wildlife Conservation and Management, Universidad Nacional, Heredia, Costa Rica
        Aaron Springford Forestry Statistics, Weyerhaeuser, Seattle, WA, USA
        Andreas Stang Institute of Medical Informatics, University Hospital Essen, Essen, Germany
        E. Ashley Steel Department of Statistics, University of Washington, Seattle, WA, USA
        Johan Steen Department of Intensive Care Medicine, Ghent University Hospital, Ghent, Belgium
        Ida Bo Steendahl Institute of Sports and Preventive Medicine, Saarland University, Saarbrücken, Germany
        Joseph Paul Stemberger Department of Linguistics, University of British Columbia, Vancouver, BC, Canada
        Andreea Steriu Public health doctor, Bucharest, Romania
        Jonathan Sterne Department of Population Health Sciences, University of Bristol, Bristol, UK
        Gavin Stewart Evidence Synthesis Lab, University of Newcastle-Upon-Tyne, Newcastle, UK
        Morgan E. Stewart Morgan Stewart Consulting, Orangeburg, SC, USA
        Steven D. Stovitz Department of Family Medicine and Community Health, University of Minnesota, Minneapolis, MN, USA
        Alex Stringer Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
        Oliver C. Stringham School of Biological Sciences, The University of Adelaide, Adelaide, Australia
        Jan Strnadel Biomedical Center Martin, Jessenius Faculty of Medicine, Comenius University, Martin, Slovakia
        Wolfgang Stroebe Faculty of Social Sciences, University of Groningen, Groningen, The Netherlands
        Donald R. Stong Department of Evolution and Ecology, University of California – Davis, Davis, CA, USA
        Jonáh J. Stunt Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
        Til Stürmer Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
        Sherri Stuver Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
        Krishna Subramanian RWTH Aachen University, Aachen, Germany
        Claudia Kimie Suemoto Division of Geriatrics, University of Sao Paulo Medical School, Sao Paulo, Brazil
        Matthew A. Summers Bone Biology Division, The Garvan Institute of Medical Research, Sydney, Australia
        Xiaoran Sun Department of Human Development and Family Studies, The Pennsylvania State University, University Park, PA, USA
        Dénes Szücs Department of Psychology, University of Cambridge, Cambridge, UK
        Lívia Takács F. Hoffmann-La Roche Ltd, Basel, Switzerland
        Poorna Talkad Sukumar College of Engineering, University of Notre Dame, Notre Dame, IN, USA
        Béla Tallósi Hortobágy National Park Directorate, Debrecen, Hungary
        Cedric Kai Wei Tan Department of Zoology, Oxford University, Oxfordshire, UK
        Mauricio Tejo Facultad de Ciencias Naturales y Exactas, Universidad de Playa Ancha, Valparaíso, Chile
        Jorge N. Tendeiro Department of Psychometrics and Statistics, Faculty of Behavioral and Social Sciences, University of Groningen, Groningen, The Netherlands
        Peter W. G. Tennant Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
        Wesley Thompson Family Medicine and Public Health, University of California – San Deigo, La Jolla, CA, USA
        Lau Caspar Thygesen National Institute of Public Health, University of Southern Denmark, Copenhagen, Denmark
        Sanne Marie Thysen Bandim Health Project, Statens Serum Intitut, Copenhagen, Denmark
        Hans Tierens Work and Organisation Studies, Faculty of Economics and Business, KU Leuven, Leuven, Belgium
        Sigal Tifferet Department of Business Administration, Ruppin Academic Center, Emek Hefer, Israel
        Sengwee Toh Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA
        Fernando H. Toledo International Maize and Wheat Improvement Center, Mexico, Mexico
        Sara Tomiolo Department of Bioscience, Aarhus University, Silkeborg, Denmark
        George Tomlinson Institute of Health Poilicy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
        Martin A. Tönz Department of Pediatric Surgery, Children’s University Hospital, Bern, Switzerland
        David Trafimow Department of Psychology, New Mexico State University, Las Cruces, New Mexico, USA
        Thach Tran Bone Biology Division, The Garvan Institute of Medical Research, Sydney, Australia
        Trung M. Tran Department of Cardiology, Kien Giang General Hospital, Kien Giang, Vietnam
        Sven Trelle CTU Bern, University of Bern, Bern, Switzerland
        Patrizio Tressoldi Department of General Psychology, Università degli Studi di Padova, Padova, Italy
        Nicole Tsao Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USAAleksandra Turkiewicz Clinical Epidemiology Unit, Department of Clinical Sciences, Lund University, Lund, Sweden
        Sarah Twardowski Department of Epidemiology and Environmental Health, University at Buffalo, Buffalo, NY, USA
        Charles R. Twardy Key W Corporation, Hardnon, VA, USA
        Janusz Uchmanski Cardinal Stefan Wyszynski University, Warsaw, Poland
        Cesar Augusto Ugarte-Gil Instituto de Medicina Tropical Alexander von Humboldt, Universidad Peruana Cayetano Heredia, Lima, Peru
        Jessica Utts Department of Statistics, University of California – Irvine, Irvine, CA, USA
        Alessandro Vagheggini Istituto Scientifico Romagnolo per lo Studio e la Cura dei Tumori (IRST) IRCCS, Meldola, Italy
        Claudia Valeggia Department of Anthropology, Yale University, New Haven, USA
        Ben Van Calster Department of Development and Regeneration, KU Leuven, Leuven, Belgium, The Netherlands
        Rens van de Schoot Department of Methodology & Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands
        Robert van den Berg ESP Science & Education, Vienna, Austria
        Marleen M. H. J. van Gelder Department for Health Evidence, Radboud Institute for Health Sciences, Radboud University Medical Center, Nijmegen, The Netherlands
        Frank van Leth Amsterdam Institute for Global Health and Deveploment, Amsterdam, The Netherlands
        Tyler J. VanderWeele Harvard T. H. Chan School of Public Health, Boston, MA, USA
        Ivan Vankov Department of Cognitive Science and Psychology, New Bulgarian University, Sofia, Bulgaria
        Santiago Velasco-Forero Center of Mathematical Morphology, MINES ParisTech, PSL Research University, France
        Kabisha Velauthapillai Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada
        Søren Viborg Vestergaard Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark
        Andrew D. Vigotsky Department of Biomedical Engineering, Northwestern University, Evanston, IL, USA
        Anders Ryom Villadsen Department of Management, Aarhus University, Aarhus, Denmark
        Marco Vinceti Department of Biomedical, Metabolic and Neural Sciences, University of Modena and Reggio Emilia, Modena, Italy
        Eva Vivalt Research School of Economics, Australian National University, Canberra, Australia
        Steven C. Vlad Tufts University School of Medicine, Boston, MA, USA
        Long Vo Hoang Institute of Preventive Medicine and Public Health, Hanoi Medical University, Hanoi, Vietnam
        Constantin Volkmann Department of Psychiatry and Psychotherapy, Charité Berlin, Berlin, Germany
        Raphael S. von Büren Department of Environmental Sciences, Botany, University of Basel, Basel, Switzerland
        Chat Wacharamanotham Department of Informatics, University of Zürich, Zürich, Switzerland
        Philippe Wagner Department of Clinical Sciences, Lund University, Lund, Sweden
        Jeff Walker Department of Biological Sciences, University of Southern Maine, Portland, ME, USA
        Shirley Wang Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
        Nicole Warrington Diamantina Institute, Faculty of Medicine, University of Queensland, Queensland, Australia
        Tim R. Watkins School of Public Health, University of Sydney, Sydney, Australia
        Oliver P. Watson Evariste Technologies, Reading, UK
        Hilary Watt Department of Primary Care and Public Health, Imperial College, London, UK
        Ethan Weed Department of Linguistics, Cognitive Science and Semiotics, Aarhus University, Aarhus, Denmark
        Martin Wegrzyn Department of Psychology, Bielefeld University, Bielefeld, Germany
        Sebastian Weichwald Max Planck Institute for Intelligent Systems, Tübingen, Germany
        Clarice Weinberg Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
        Helen Weiss London School of Hygiene & Tropical Medince, London, UK
        David Welch School of Computer Science, University of Auckland, Auckland, New Zealand
        Vivian Welch Campbell Collaboration, Ottawa, ON, Canada
        Gregory Wellenius Department of Epidemiology, Brown University, Providence, RI, USA
        Valerie Welty Department of Biostatistics, Vanderbilt University, Nashville, TN, USA
        Cynthia M. Westerhout Canadian VIGOUR Centre, University of Alberta, Edmonton, AB, Canada
        Ruud Wetzels Center for Accounting, Auditing and Control, Nyenrode Business University, Breukelen, The Netherlands
        Ben Whalley School of Psychology, Plymouth University, Plymouth, UK
        Keith Wheatley Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
        Thomas E. White The University of Sydney, Sydney, Australia
        Cory W. Whitney Institute of Crop Sciences and Resource Conservation, Center for Development Research, University of Bonn, Bonn, Germany
        Paweł Wiczling Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Gdańsk, Poland
        Brenton M. Wiernik Department of Psychology, University of South Florida, Tampa, FL, USA
        Allen J. Wilcox Epidemiology Branch, National Institute of Environmental Health Sciencies, NIH, Durham, NC, USA
        Justin Wilkins Occams, Amstelveen, The Netherlands
        Donald Williams Department of Psychology, University of Califonia – Davis, Davis, CA, USA
        Chris H. Wilson Agronomy Department, University of Florida, Gainesville, FL, USA
        John Wilson Ivey Business School, Western University, London, ON, CanadaPatrick Michael Wilson Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
        Dawid Winiarczyk Institute of Genetics and Animal Breeding PAS, Jastrzębiec, Poland
        Lauren A. Wise Boston University School of Public Health, Boston, MA, USA
        Lauren E. Wisk David Geffen School of Medicine, University of California – Los Angeles, Los Angeles, CA, USA
        Torbjørn Wisløff Norwegian Institute of Public Health, Oslo, Norway
        William H. Woodall Virginia Tech, Blacksburg, Virginia, USA
        Jesper N. Wulff Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
        Laure Wynants KU Leuven, Leuven, Belgium; Maastricht University, Maastricht, The Netherlands
        Abraham J. Wyner Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
        Tomasz Wyszomirski Faculty of Biology, Biological & Chemical Research Center, Warsaw University, Poland
        Yan Xie Clinical Epidemiology Center, Saint Louis VA Health Care System, Saint Louis, MO, USA
        Nancy Yacovzada Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
        Lakshmi Narayana Yaddanapudi Department of Anaesthesia and Intensive Care, Postgraduate Institute of Medical Education and Research, Chandigarh, India
        Yuki Yamada Faculty of Arts and Science, Kyushu University, Fukuoka, Japan
        Yuki Yanai School of Economics & Management, Kochi University of Technology, Kochi, Japan
        Nigel Yoccoz Uit The Arctic University of Norway, Tromsø, Norway
        Isao Yokota Department of Biostatistics, Hokkaido University, Sapporo, Japan
        Cristobal Young Department of Sociology, Cornell University, Ithaca, NY, USA
        Yongfu Yu Department of Epidemiology, University of California – Los Angeles, Los Angeles, CA, USA
        Yiqiang Zhan Karolinska Institutet, Stockholm, Sweden
        Sizheng Steven Zhao Musculoskeletal Biology, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
        Yingjie Zheng Department of Epidemiology, Fudan University School of Public Health, Shanghai, China
        Boback Ziaeian Division of Cardiology, David Geffen School of Medicine, University of California – Los Angeles, Los Angeles, CA, USA
        Stephen T. Ziliak College of Arts and Sciences, Roosevelt University, Chicago, IL USA
        Felipe C. M. Zoppino Oncology Laboratory, Instituto de Medicina y Biologia Experimental de Cuyo, CCT CONICET Mendoza, Mendoza, Argentina
        Marcel Zwahlen Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerlan

      https://www.tandfonlin00031305.2019.1583913e.com/doi/full/10.1080/

      Skip to Main Content

      Taylor and Francis Online

      Log in  |  Register Cart

      1. Home
      2.  All Journals
      3.  The American Statistician
      4.  List of Issues
      5.  Volume 73, Issue sup1
      6.  Moving to a World Beyond “p < 0.05”

       Search in:  This Journal   Anywhere   Advanced search

      Publication Cover

      The American Statistician Volume 73, 2019 – Issue sup1: Statistical Inference in the 21st Century: A World Beyond p < 0.05Submit an articleJournal homepage

      Open access

      233,801Views643CrossRef citations to date1400AltmetricListenReadSpeaker webReader: ListenFocus

      Editorial

      Moving to a World Beyond “p < 0.05”

      Ronald L. Wasserstein,Allen L. Schirm &Nicole A. LazarPages 1-19 | Published online: 20 Mar 2019

      In this article

      Previous articleView issue table of contentsNext article

      This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.

      March 16, 2019

      Some of you exploring this special issue of The American Statistician might be wondering if it’s a scolding from pedantic statisticians lecturing you about what not to do with p-values, without offering any real ideas of what to do about the very hard problem of separating signal from noise in data and making decisions under uncertainty. Fear not. In this issue, thanks to 43 innovative and thought-provoking papers from forward-looking statisticians, help is on the way.

      1 “Don’t” Is Not Enough

      There’s not much we can say here about the perils of p-values and significance testing that hasn’t been said already for decades (Ziliak and McCloskey 2008; Hubbard 2016). If you’re just arriving to the debate, here’s a sampling of what not to do:

      • Don’t base your conclusions solely on whether an association or effect was found to be “statistically significant” (i.e., the p-value passed some arbitrary threshold such as p < 0.05).
      • Don’t believe that an association or effect exists just because it was statistically significant.
      • Don’t believe that an association or effect is absent just because it was not statistically significant.
      • Don’t believe that your p-value gives the probability that chance alone produced the observed association or effect or the probability that your test hypothesis is true.
      • Don’t conclude anything about scientific or practical importance based on statistical significance (or lack thereof).

      Don’t. Don’t. Just…don’t. Yes, we talk a lot about don’ts. The ASA Statement on p-Values and Statistical Significance (Wasserstein and Lazar 2016) was developed primarily because after decades, warnings about the don’ts had gone mostly unheeded. The statement was about what not to do, because there is widespread agreement about the don’ts.

      Knowing what not to do with p-values is indeed necessary, but it does not suffice. It is as though statisticians were asking users of statistics to tear out the beams and struts holding up the edifice of modern scientific research without offering solid construction materials to replace them. Pointing out old, rotting timbers was a good start, but now we need more.

      Recognizing this, in October 2017, the American Statistical Association (ASA) held the Symposium on Statistical Inference, a two-day gathering that laid the foundations for this special issue of The American Statistician. Authors were explicitly instructed to develop papers for the variety of audiences interested in these topics. If you use statistics in research, business, or policymaking but are not a statistician, these articles were indeed written with YOU in mind. And if you are a statistician, there is still much here for you as well.

      The papers in this issue propose many new ideas, ideas that in our determination as editors merited publication to enable broader consideration and debate. The ideas in this editorial are likewise open to debate. They are our own attempt to distill the wisdom of the many voices in this issue into an essence of good statistical practice as we currently see it: some do’s for teaching, doing research, and informing decisions.

      Yet the voices in the 43 papers in this issue do not sing as one. At times in this editorial and the papers you’ll hear deep dissonance, the echoes of “statistics wars” still simmering today (Mayo 2018). At other times you’ll hear melodies wrapping in a rich counterpoint that may herald an increasingly harmonious new era of statistics. To us, these are all the sounds of statistical inference in the 21st century, the sounds of a world learning to venture beyond “p < 0.05.”

      This is a world where researchers are free to treat “p = 0.051” and “p = 0.049” as not being categorically different, where authors no longer find themselves constrained to selectively publish their results based on a single magic number. In this world, where studies with “p < 0.05” and studies with “p > 0.05” are not automatically in conflict, researchers will see their results more easily replicated—and, even when not, they will better understand why. As we venture down this path, we will begin to see fewer false alarms, fewer overlooked discoveries, and the development of more customized statistical strategies. Researchers will be free to communicate all their findings in all their glorious uncertainty, knowing their work is to be judged by the quality and effective communication of their science, and not by their p-values. As “statistical significance” is used less, statistical thinking will be used more.

      The ASA Statement on P-Values and Statistical Significance started moving us toward this world. As of the date of publication of this special issue, the statement has been viewed over 294,000 times and cited over 1700 times—an average of about 11 citations per week since its release. Now we must go further. That’s what this special issue of The American Statistician sets out to do.

      To get to the do’s, though, we must begin with one more don’t.

      2 Don’t Say “Statistically Significant”

      The ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of “statistical significance” be abandoned. We take that step here. We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as “significantly different,” “p < 0.05,” and “nonsignificant” survive, whether expressed in words, by asterisks in a table, or in some other way.

      Regardless of whether it was ever useful, a declaration of “statistical significance” has today become meaningless. Made broadly known by Fisher’s use of the phrase (1925), Edgeworth’s (1885) original intention for statistical significance was simply as a tool to indicate when a result warrants further scrutiny. But that idea has been irretrievably lost. Statistical significance was never meant to imply scientific importance, and the confusion of the two was decried soon after its widespread use (Boring 1919). Yet a full century later the confusion persists.

      And so the tool has become the tyrant. The problem is not simply use of the word “significant,” although the statistical and ordinary language meanings of the word are indeed now hopelessly confused (Ghose 2013); the term should be avoided for that reason alone. The problem is a larger one, however: using bright-line rules for justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision making (ASA statement, Principle 3). A label of statistical significance adds nothing to what is already conveyed by the value of p; in fact, this dichotomization of p-values makes matters worse.

      For example, no p-value can reveal the plausibility, presence, truth, or importance of an association or effect. Therefore, a label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important. Nor does a label of statistical nonsignificance lead to the association or effect being improbable, absent, false, or unimportant. Yet the dichotomization into “significant” and “not significant” is taken as an imprimatur of authority on these characteristics. In a world without bright lines, on the other hand, it becomes untenable to assert dramatic differences in interpretation from inconsequential differences in estimates. As Gelman and Stern (2006) famously observed, the difference between “significant” and “not significant” is not itself statistically significant.

      Furthermore, this false split into “worthy” and “unworthy” results leads to the selective reporting and publishing of results based on their statistical significance—the so-called “file drawer problem” (Rosenthal 1979). And the dichotomized reporting problem extends beyond just publication, notes Amrhein, Trafimow, and Greenland (2019): when authors use p-value thresholds to select which findings to discuss in their papers, “their conclusions and what is reported in subsequent news and reviews will be biased…Such selective attention based on study outcomes will therefore not only distort the literature but will slant published descriptions of study results—biasing the summary descriptions reported to practicing professionals and the general public.” For the integrity of scientific publishing and research dissemination, therefore, whether a p-value passes any arbitrary threshold should not be considered at all when deciding which results to present or highlight.

      To be clear, the problem is not that of having only two labels. Results should not be trichotomized, or indeed categorized into any number of groups, based on arbitrary p-value thresholds. Similarly, we need to stop using confidence intervals as another means of dichotomizing (based, on whether a null value falls within the interval). And, to preclude a reappearance of this problem elsewhere, we must not begin arbitrarily categorizing other statistical measures (such as Bayes factors).

      Despite the limitations of p-values (as noted in Principles 5 and 6 of the ASA statement), however, we are not recommending that the calculation and use of continuous p-values be discontinued. Where p-values are used, they should be reported as continuous quantities (e.g., p = 0.08). They should also be described in language stating what the value means in the scientific context. We believe that a reasonable prerequisite for reporting any p-value is the ability to interpret it appropriately. We say more about this in Section 3.3.

      To move forward to a world beyond “p < 0.05,” we must recognize afresh that statistical inference is not—and never has been—equivalent to scientific inference (Hubbard, Haig, and Parsa 2019; Ziliak 2019). However, looking to statistical significance for a marker of scientific observations’ credibility has created a guise of equivalency. Moving beyond “statistical significance” opens researchers to the real significance of statistics, which is “the science of learning from data, and of measuring, controlling, and communicating uncertainty” (Davidian and Louis 2012).

      In sum, “statistically significant”—don’t say it and don’t use it.

      3 There Are Many Do’s

      With the don’ts out of the way, we can finally discuss ideas for specific, positive, constructive actions. We have a massive list of them in the seventh section of this editorial! In that section, the authors of all the articles in this special issue each provide their own short set of do’s. Those lists, and the rest of this editorial, will help you navigate the substantial collection of articles that follows.

      Because of the size of this collection, we take the liberty here of distilling our readings of the articles into a summary of what can be done to move beyond “p < 0.05.” You will find the rich details in the articles themselves.

      What you will NOT find in this issue is one solution that majestically replaces the outsized role that statistical significance has come to play. The statistical community has not yet converged on a simple paradigm for the use of statistical inference in scientific research—and in fact it may never do so. A one-size-fits-all approach to statistical inference is an inappropriate expectation, even after the dust settles from our current remodeling of statistical practice (Tong 2019). Yet solid principles for the use of statistics do exist, and they are well explained in this special issue.

      We summarize our recommendations in two sentences totaling seven words: “Accept uncertainty. Be thoughtful, open, and modest.” Remember “ATOM.”

      3.1 Accept Uncertainty

      Uncertainty exists everywhere in research. And, just like with the frigid weather in a Wisconsin winter, there are those who will flee from it, trying to hide in warmer havens elsewhere. Others, however, accept and even delight in the omnipresent cold; these are the ones who buy the right gear and bravely take full advantage of all the wonders of a challenging climate. Significance tests and dichotomized p-values have turned many researchers into scientific snowbirds, trying to avoid dealing with uncertainty by escaping to a “happy place” where results are either statistically significant or not. In the real world, data provide a noisy signal. Variation, one of the causes of uncertainty, is everywhere. Exact replication is difficult to achieve. So it is time to get the right (statistical) gear and “move toward a greater acceptance of uncertainty and embracing of variation” (Gelman 2016).

      Statistical methods do not rid data of their uncertainty. “Statistics,” Gelman (2016) says, “is often sold as a sort of alchemy that transmutes randomness into certainty, an ‘uncertainty laundering’ that begins with data and concludes with success as measured by statistical significance.” To accept uncertainty requires that we “treat statistical results as being much more incomplete and uncertain than is currently the norm” (Amrhein, Trafimow, and Greenland 2019). We must “countenance uncertainty in all statistical conclusions, seeking ways to quantify, visualize, and interpret the potential for error” (Calin-Jageman and Cumming 2019).

      “Accept uncertainty and embrace variation in effects,” advise McShane et al. in Section 7 of this editorial. “[W]e can learn much (indeed, more) about the world by forsaking the false promise of certainty offered by dichotomous declarations of truth or falsity—binary statements about there being ‘an effect’ or ‘no effect’—based on some p-value or other statistical threshold being attained.”

      We can make acceptance of uncertainty more natural to our thinking by accompanying every point estimate in our research with a measure of its uncertainty such as a standard error or interval estimate. Reporting and interpreting point and interval estimates should be routine. However, simplistic use of confidence intervals as a measurement of uncertainty leads to the same bad outcomes as use of statistical significance (especially, a focus on whether such intervals include or exclude the “null hypothesis value”). Instead, Greenland (2019) and Amrhein, Trafimow, and Greenland (2019) encourage thinking of confidence intervals as “compatibility intervals,” which use p-values to show the effect sizes that are most compatible with the data under the given model.

      How will accepting uncertainty change anything? To begin, it will prompt us to seek better measures, more sensitive designs, and larger samples, all of which increase the rigor of research. It also helps us be modest (the fourth of our four principles, on which we will expand in Section 3.4) and encourages “meta-analytic thinking” (Cumming 2014). Accepting uncertainty as inevitable is a natural antidote to the seductive certainty falsely promised by statistical significance. With this new outlook, we will naturally seek out replications and the integration of evidence through meta-analyses, which usually requires point and interval estimates from contributing studies. This will in turn give us more precise overall estimates for our effects and associations. And this is what will lead to the best research-based guidance for practical decisions.

      Accepting uncertainty leads us to be thoughtful, the second of our four principles.

      3.2 Be Thoughtful

      What do we mean by this exhortation to “be thoughtful”? Researchers already clearly put much thought into their work. We are not accusing anyone of laziness. Rather, we are envisioning a sort of “statistical thoughtfulness.” In this perspective, statistically thoughtful researchers begin above all else with clearly expressed objectives. They recognize when they are doing exploratory studies and when they are doing more rigidly pre-planned studies. They invest in producing solid data. They consider not one but a multitude of data analysis techniques. And they think about so much more.

      3.2.1 Thoughtfulness in the Big Picture

      “[M]ost scientific research is exploratory in nature,” Tong (2019) contends. “[T]he design, conduct, and analysis of a study are necessarily flexible, and must be open to the discovery of unexpected patterns that prompt new questions and hypotheses. In this context, statistical modeling can be exceedingly useful for elucidating patterns in the data, and researcher degrees of freedom can be helpful and even essential, though they still carry the risk of overfitting. The price of allowing this flexibility is that the validity of any resulting statistical inferences is undermined.”

      Calin-Jageman and Cumming (2019) caution that “in practice the dividing line between planned and exploratory research can be difficult to maintain. Indeed, exploratory findings have a slippery way of ‘transforming’ into planned findings as the research process progresses.” At the bottom of that slippery slope one often finds results that don’t reproduce.

      Anderson (2019) proposes three questions thoughtful researchers asked thoughtful researchers evaluating research results: What are the practical implications of the estimate? How precise is the estimate? And is the model correctly specified? The latter question leads naturally to three more: Are the modeling assumptions understood? Are these assumptions valid? And do the key results hold up when other modeling choices are made? Anderson further notes, “Modeling assumptions (including all the choices from model specification to sample selection and the handling of data issues) should be sufficiently documented so independent parties can critique, and replicate, the work.”

      Drawing on archival research done at the Guinness Archives in Dublin, Ziliak (2019) emerges with ten “G-values” he believes we all wish to maximize in research. That is, we want large G-values, not small p-values. The ten principles of Ziliak’s “Guinnessometrics” are derived primarily from his examination of experiments conducted by statistician William Sealy Gosset while working as Head Brewer for Guinness. Gosset took an economic approach to the logic of uncertainty, preferring balanced designs over random ones and estimation of gambles over bright-line “testing.” Take, for example, Ziliak’s G-value 10: “Consider purpose of the inquiry, and compare with best practice,” in the spirit of what farmers and brewers must do. The purpose is generally NOT to falsify a null hypothesis, says Ziliak. Ask what is at stake, he advises, and determine what magnitudes of change are humanly or scientifically meaningful in context.

      Pogrow (2019) offers an approach based on practical benefit rather than statistical or practical significance. This approach is especially useful, he says, for assessing whether interventions in complex organizations (such as hospitals and schools) are effective, and also for increasing the likelihood that the observed benefits will replicate in subsequent research and in clinical practice. In this approach, “practical benefit” recognizes that reliance on small effect sizes can be as problematic as relying on p-values.

      Thoughtful research prioritizes sound data production by putting energy into the careful planning, design, and execution of the study (Tong 2019).

      Locascio (2019) urges researchers to be prepared for a new publishing model that evaluates their research based on the importance of the questions being asked and the methods used to answer them, rather than the outcomes obtained.

      3.2.2 Thoughtfulness Through Context and Prior Knowledge

      Thoughtful research considers the scientific context and prior evidence. In this regard, a declaration of statistical significance is the antithesis of thoughtfulness: it says nothing about practical importance, and it ignores what previous studies have contributed to our knowledge.

      Thoughtful research looks ahead to prospective outcomes in the context of theory and previous research. Researchers would do well to ask, What do we already know, and how certain are we in what we know? And building on that and on the field’s theory, what magnitudes of differences, odds ratios, or other effect sizes are practically important? These questions would naturally lead a researcher, for example, to use existing evidence from a literature review to identify specifically the findings that would be practically important for the key outcomes under study.

      Thoughtful research includes careful consideration of the definition of a meaningful effect size. As a researcher you should communicate this up front, before data are collected and analyzed. Afterwards is just too late; it is dangerously easy to justify observed results after the fact and to overinterpret trivial effect sizes as being meaningful. Many authors in this special issue argue that consideration of the effect size and its “scientific meaningfulness” is essential for reliable inference (e.g., Blume et al. 2019; Betensky 2019). This concern is also addressed in the literature on equivalence testing (Wellek 2017).

      Thoughtful research considers “related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain…without giving priority to p-values or other purely statistical measures” (McShane et al. 2019).

      Thoughtful researchers “use a toolbox of statistical techniques, employ good judgment, and keep an eye on developments in statistical and data science,” conclude Heck and Krueger ((2019)), who demonstrate how the p-value can be useful to researchers as a heuristic.

      3.2.3 Thoughtful Alternatives and Complements to P-Values

      Thoughtful research considers multiple approaches for solving problems. This special issue includes some ideas for supplementing or replacing p-values. Here is a short summary of some of them, with a few technical details:

      Amrhein, Trafimow, and Greenland (2019) and Greenland (2019) advise that null p-values should be supplemented with a p-value from a test of a pre-specified alternative (such as a minimal important effect size). To reduce confusion with posterior probabilities and better portray evidential value, they further advise that p-values be transformed into s-values (Shannon information, surprisal, or binary logworth) s = – log2(p). This measure of evidence affirms other arguments that the evidence against a hypothesis contained in the p-value is not nearly as strong as is believed by many researchers. The change of scale also moves users away from probability misinterpretations of the p-value.

      Blume et al. (2019) offer a “second generation p-value (SGPV),” the characteristics of which mimic or improve upon those of p-values but take practical significance into account. The null hypothesis from which an SGPV is computed is a composite hypothesis representing a range of differences that would be practically or scientifically inconsequential, as in equivalence testing (Wellek 2017). This range is determined in advance by the experimenters. When the SGPV is 1, the data only support null hypotheses; when the SGPV is 0, the data are incompatible with any of the null hypotheses. SGPVs between 0 and 1 are inconclusive at varying levels (maximally inconclusive at or near SGPV = 0.5.) Blume et al. illustrate how the SGPV provides a straightforward and useful descriptive summary of the data. They argue that it eliminates the problem of how classical statistical significance does not imply scientific relevance, it lowers false discovery rates, and its conclusions are more likely to reproduce in subsequent studies.

      The “analysis of credibility”(AnCred) is promoted by Matthews (2019). This approach takes account of both the width of the confidence interval and the location of its bounds when assessing weight of evidence. AnCred assesses the credibility of inferences based on the confidence interval by determining the level of prior evidence needed for a new finding to provide credible evidence for a nonzero effect. If this required level of prior evidence is supported by current knowledge and insight, Matthews calls the new result “credible evidence for a non-zero effect,” irrespective of its statistical significance/nonsignificance.

      Colquhoun (2019) proposes continuing the use of continuous p-values, but only in conjunction with the “false positive risk (FPR).” The FPR answers the question, “If you observe a ‘significant’ p-value after doing a single unbiased experiment, what is the probability that your result is a false positive?” It tells you what most people mistakenly still think the p-value does, Colquhoun says. The problem, however, is that to calculate the FPR you need to specify the prior probability that an effect is real, and it’s rare to know this. Colquhoun suggests that the FPR could be calculated with a prior probability of 0.5, the largest value reasonable to assume in the absence of hard prior data. The FPR found this way is in a sense the minimum false positive risk (mFPR); less plausible hypotheses (prior probabilities below 0.5) would give even bigger FPRs, Colquhoun says, but the mFPR would be a big improvement on reporting a p-value alone. He points out that p-values near 0.05 are, under a variety of assumptions, associated with minimum false positive risks of 20–30%, which should stop a researcher from making too big a claim about the “statistical significance” of such a result.

      Benjamin and Berger (2019) propose a different supplement to the null p-value. The Bayes factor bound (BFB)—which under typically plausible assumptions is the value 1/(-ep ln p)—represents the upper bound of the ratio of data-based odds of the alternative hypothesis to the null hypothesis. Benjamin and Berger advise that the BFB should be reported along with the continuous p-value. This is an incomplete step toward revising practice, they argue, but one that at least confronts the researcher with the maximum possible odds that the alternative hypothesis is true—which is what researchers often think they are getting with a p-value. The BFB, like the FPR, often clarifies that the evidence against the null hypothesis contained in the p-value is not nearly as strong as is believed by many researchers.

      Goodman, Spruill, and Komaroff (2019) propose a two-stage approach to inference, requiring both a small p-value below a pre-specified level and a pre-specified sufficiently large effect size before declaring a result “significant.” They argue that this method has improved performance relative to use of dichotomized p-values alone.

      Gannon, Pereira, and Polpo (2019) have developed a testing procedure combining frequentist and Bayesian tools to provide a significance level that is a function of sample size.

      Manski (2019) and Manski and Tetenov (2019) urge a return to the use of statistical decision theory, which they say has largely been forgotten. Statistical decision theory is not based on p-value thresholds and readily distinguishes between statistical and clinical significance.

      Billheimer (2019) suggests abandoning inference about parameters, which are frequently hypothetical quantities used to idealize a problem. Instead, he proposes focusing on the prediction of future observables, and their associated uncertainty, as a means to improving science and decision-making.

      3.2.4 Thoughtful Communication of Confidence

      Be thoughtful and clear about the level of confidence or credibility that is present in statistical results.

      Amrhein, Trafimow, and Greenland (2019) and Greenland (2019) argue that the use of words like “significance” in conjunction with p-values and “confidence” with interval estimates misleads users into overconfident claims. They propose that researchers think of p-values as measuring the compatibility between hypotheses and data, and interpret interval estimates as “compatibility intervals.”

      In what may be a controversial proposal, Goodman (2018) suggests requiring “that any researcher making a claim in a study accompany it with their estimate of the chance that the claim is true.” Goodman calls this the confidence index. For example, along with stating “This drug is associated with elevated risk of a heart attack, relative risk (RR) = 2.4, p = 0.03,” Goodman says investigators might add a statement such as “There is an 80% chance that this drug raises the risk, and a 60% chance that the risk is at least doubled.” Goodman acknowledges, “Although simple on paper, requiring a confidence index would entail a profound overhaul of scientific and statistical practice.”

      In a similar vein, Hubbard and Carriquiry (2019) urge that researchers prominently display the probability the hypothesis is true or a probability distribution of an effect size, or provide sufficient information for future researchers and policy makers to compute it. The authors further describe why such a probability is necessary for decision making, how it could be estimated by using historical rates of reproduction of findings, and how this same process can be part of continuous “quality control” for science.

      Being thoughtful in our approach to research will lead us to be open in our design, conduct, and presentation of it as well.

      3.3 Be Open

      We envision openness as embracing certain positive practices in the development and presentation of research work.

      3.3.1 Openness to Transparency and to the Role of Expert Judgment

      First, we repeat oft-repeated advice: Be open to “open science” practices. Calin-Jageman and Cumming (2019), Locascio (2019), and others in this special issue urge adherence to practices such as public pre-registration of methods, transparency and completeness in reporting, shared data and code, and even pre-registered (“results-blind”) review. Completeness in reporting, for example, requires not only describing all analyses performed but also presenting all findings obtained, without regard to statistical significance or any such criterion.

      Openness also includes understanding and accepting the role of expert judgment, which enters the practice of statistical inference and decision-making in numerous ways (O’Hagan 2019). “Indeed, there is essentially no aspect of scientific investigation in which judgment is not required,” O’Hagan observes. “Judgment is necessarily subjective, but should be made as carefully, as objectively, and as scientifically as possible.”

      Subjectivity is involved in any statistical analysis, Bayesian or frequentist. Gelman and Hennig (2017) observe, “Personal decision making cannot be avoided in statistical data analysis and, for want of approaches to justify such decisions, the pursuit of objectivity degenerates easily to a pursuit to merely appear objective.” One might say that subjectivity is not a problem; it is part of the solution.

      Acknowledging this, Brownstein et al. (2019) point out that expert judgment and knowledge are required in all stages of the scientific method. They examine the roles of expert judgment throughout the scientific process, especially regarding the integration of statistical and content expertise. “All researchers, irrespective of their philosophy or practice, use expert judgment in developing models and interpreting results,” say Brownstein et al. “We must accept that there is subjectivity in every stage of scientific inquiry, but objectivity is nevertheless the fundamental goal. Therefore, we should base judgments on evidence and careful reasoning, and seek wherever possible to eliminate potential sources of bias.”

      How does one rigorously elicit expert knowledge and judgment in an effective, unbiased, and transparent way? O’Hagan (2019) addresses this, discussing protocols to elicit expert knowledge in an unbiased and as scientifically sound was as possible. It is also important for such elicited knowledge to be examined critically, comparing it to actual study results being an important diagnostic step.

      3.3.2 Openness in Communication

      Be open in your reporting. Report p-values as continuous, descriptive statistics, as we explain in Section 2. We realize that this leaves researchers without their familiar bright line anchors. Yet if we were to propose a universal template for presenting and interpreting continuous p-values we would violate our own principles! Rather, we believe that the thoughtful use and interpretation of p-values will never adhere to a rigid rulebook, and will instead inevitably vary from study to study. Despite these caveats, we can offer recommendations for sound practices, as described below.

      In all instances, regardless of the value taken by p or any other statistic, consider what McShane et al. (2019) call the “currently subordinate factors”—the factors that should no longer be subordinate to “p < 0.05.” These include relevant prior evidence, plausibility of mechanism, study design and data quality, and the real-world costs and benefits that determine what effects are scientifically important. The scientific context of your study matters, they say, and this should guide your interpretation.

      When using p-values, remember not only Principle 5 of the ASA statement: “A p-value…does not measure the size of an effect or the importance of a result” but also Principle 6: “By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.” Despite these limitations, if you present p-values, do so for more than one hypothesized value of your variable of interest (Fraser 2019; Greenland 2019), such as 0 and at least one plausible, relevant alternative, such as the minimum practically important effect size (which should be determined before analyzing the data).

      Betensky (2019) also reminds us to interpret the p-value in the context of sample size and meaningful effect size.

      Instead of p, you might consider presenting the s-value (Greenland 2019), which is described in Section 3.2. As noted in Section 3.1, you might present a confidence interval. Sound practices in the interpretation of confidence intervals include (1) discussing both the upper and lower limits and whether they have different practical implications, (2) paying no particular attention to whether the interval includes the null value, and (3) remembering that an interval is itself an estimate subject to error and generally provides only a rough indication of uncertainty given that all of the assumptions used to create it are correct and, thus, for example, does not “rule out” values outside the interval. Amrhein, Trafimow, and Greenland (2019) suggest that interval estimates be interpreted as “compatibility” intervals rather than as “confidence” intervals, showing the values that are most compatible with the data, under the model used to compute the interval. They argue that such an interpretation and the practices outlined here can help guard against overconfidence.

      It is worth noting that Tong (2019) disagrees with using p-values as descriptive statistics. “Divorced from the probability claims attached to such quantities (confidence levels, nominal Type I errors, and so on), there is no longer any reason to privilege such quantities over descriptive statistics that more directly characterize the data at hand.” He further states, “Methods with alleged generality, such as the p-value or Bayes factor, should be avoided in favor of discipline- and problem-specific solutions that can be designed to be fit for purpose.”

      Failing to be open in reporting leads to publication bias. Ioannidis (2019) notes the high level of selection bias prevalent in biomedical journals. He defines “selection” as “the collection of choices that lead from the planning of a study to the reporting of p-values.” As an illustration of one form of selection bias, Ioannidis compared “the set of p-values reported in the full text of an article with the set of p-values reported in the abstract.” The main finding, he says, “was that p-values chosen for the abstract tended to show greater significance than those reported in the text, and that the gradient was more pronounced in some types of journals and types of designs.” Ioannidis notes, however, that selection bias “can be present regardless of the approach to inference used.” He argues that in the long run, “the only direct protection must come from standards for reproducible research.”

      To be open, remember that one study is rarely enough. The words “a groundbreaking new study” might be loved by news writers but must be resisted by researchers. Breaking ground is only the first step in building a house. It will be suitable for habitation only after much more hard work.

      Be open by providing sufficient information so that other researchers can execute meaningful alternative analyses. van Dongen et al. (2019) provide an illustrative example of such alternative analyses by different groups attacking the same problem.

      Being open goes hand in hand with being modest.

      3.4 Be Modest

      Researchers of any ilk may rarely advertise their personal modesty. Yet the most successful ones cultivate a practice of being modest throughout their research, by understanding and clearly expressing the limitations of their work.

      Being modest requires a reality check (Amrhein, Trafimow, and Greenland 2019). “A core problem,” they observe, “is that both scientists and the public confound statistics with reality. But statistical inference is a thought experiment, describing the predictive performance of models about reality. Of necessity, these models are extremely simplified relative to the complexities of actual study conduct and of the reality being studied. Statistical results must eventually mislead us when they are used and communicated as if they present this complex reality, rather than a model for it. This is not a problem of our statistical methods. It is a problem of interpretation and communication of results.”

      Be modest in recognizing there is not a “true statistical model” underlying every problem, which is why it is wise to thoughtfully consider many possible models (Lavine 2019). Rougier (2019) calls on researchers to “recognize that behind every choice of null distribution and test statistic, there lurks a plausible family of alternative hypotheses, which can provide more insight into the null distribution.”p-values, confidence intervals, and other statistical measures are all uncertain. Treating them otherwise is immodest overconfidence.

      Remember that statistical tools have their limitations. Rose and McGuire (2019) show how use of stepwise regression in health care settings can lead to policies that are unfair.

      Remember also that the amount of evidence for or against a hypothesis provided by p-values near the ubiquitous p < 0.05 threshold (Johnson 2019) is usually much less than you think (Benjamin and Berger 2019; Colquhoun 2019; Greenland 2019).

      Be modest about the role of statistical inference in scientific inference. “Scientific inference is a far broader concept than statistical inference,” says Hubbard, Haig, and Parsa (2019). “A major focus of scientific inference can be viewed as the pursuit of significant sameness, meaning replicable and empirically generalizable results among phenomena. Regrettably, the obsession with users of statistical inference to report significant differences in data sets actively thwarts cumulative knowledge development.”

      The nexus of openness and modesty is to report everything while at the same time not concluding anything from a single study with unwarranted certainty. Because of the strong desire to inform and be informed, there is a relentless demand to state results with certainty. Again, accept uncertainty and embrace variation in associations and effects, because they are always there, like it or not. Understand that expressions of uncertainty are themselves uncertain. Accept that one study is rarely definitive, so encourage, sponsor, conduct, and publish replication studies. Then, use meta-analysis, evidence reviews, and Bayesian methods to synthesize evidence across studies.

      Resist the urge to overreach in the generalizability of claims, Watch out for pressure to embellish the abstract or the press release. If the study’s limitations are expressed in the paper but not in the abstract, they may never be read.

      Be modest by encouraging others to reproduce your work. Of course, for it to be reproduced readily, you will necessarily have been thoughtful in conducting the research and open in presenting it.

      Hubbard and Carriquiry (see their “do list” in Section 7) suggest encouraging reproduction of research by giving “a byline status for researchers who reproduce studies.” They would like to see digital versions of papers dynamically updated to display “Reproduced by….” below original research authors’ names or “not yet reproduced” until it is reproduced.

      Indeed, when it comes to reproducibility, Amrhein, Trafimow, and Greenland (2019) demand that we be modest in our expectations. “An important role for statistics in research is the summary and accumulation of information,” they say. “If replications do not find the same results, this is not necessarily a crisis, but is part of a natural process by which science evolves. The goal of scientific methodology should be to direct this evolution toward ever more accurate descriptions of the world and how it works, not toward ever more publication of inferences, conclusions, or decisions.”

      Referring to replication studies in psychology, McShane et al. (2019) recommend that future large-scale replication projects “should follow the ‘one phenomenon, many studies’ approach of the Many Labs project and Registered Replication Reports rather than the ‘many phenomena, one study’ approach of the Open Science Collaboration project. In doing so, they should systematically vary method factors across the laboratories involved in the project.” This approach helps achieve the goals of Amrhein, Trafimow, and Greenland (2019) by increasing understanding of why and when results replicate or fail to do so, yielding more accurate descriptions of the world and how it works. It also speaks to significant sameness versus significant difference a la Hubbard, Haig, and Parsa (2019).

      Kennedy-Shaffer’s (2019) historical perspective on statistical significance reminds us to be modest, by prompting us to recall how the current state of affairs in p-values has come to be.

      Finally, be modest by recognizing that different readers may have very different stakes on the results of your analysis, which means you should try to take the role of a neutral judge rather than an advocate for any hypothesis. This can be done, for example, by pairing every null p-value with a p-value testing an equally reasonable alternative, and by discussing the endpoints of every interval estimate (not only whether it contains the null).

      Accept that both scientific inference and statistical inference are hard, and understand that no knowledge will be efficiently advanced using simplistic, mechanical rules and procedures. Accept also that pure objectivity is an unattainable goal—no matter how laudable—and that both subjectivity and expert judgment are intrinsic to the conduct of science and statistics. Accept that there will always be uncertainty, and be thoughtful, open, and modest. ATOM.

      And to push this acronym further, we argue in the next section that institutional change is needed, so we put forward that change is needed at the ATOMIC level. Let’s go.

      4 Editorial, Educational and Other Institutional Practices Will Have to Change

      Institutional reform is necessary for moving beyond statistical significance in any context—whether journals, education, academic incentive systems, or others. Several papers in this special issue focus on reform.

      Goodman (2019) notes considerable social change is needed in academic institutions, in journals, and among funding and regulatory agencies. He suggests (see Section 7) partnering “with science reform movements and reformers within disciplines, journals, funding agencies and regulators to promote and reward ‘reproducible’ science and diminish the impact of statistical significance on publication, funding and promotion.” Similarly, Colquhoun (2019) says, “In the end, the only way to solve the problem of reproducibility is to do more replication and to reduce the incentives that are imposed on scientists to produce unreliable work. The publish-or-perish culture has damaged science, as has the judgment of their work by silly metrics.”

      Trafimow (2019), who added energy to the discussion of p-values a few years ago by banning them from the journal he edits (Fricker et al. 2019), suggests five “nonobvious changes” to editorial practice. These suggestions, which demand reevaluating traditional practices in editorial policy, will not be trivial to implement but would result in massive change in some journals.

      Locascio (20172019) suggests that evaluation of manu-scripts for publication should be “results-blind.” That is, manuscripts should be assessed for suitability for publication based on the substantive importance of the research without regard to their reported results. Kmetz (2019) supports this approach as well and says that it would be a huge benefit for reviewers, “freeing [them] from their often thankless present jobs and instead allowing them to review research designs for their potential to provide useful knowledge.” (See also “registered reports” from the Center for Open Science (https://cos.io/rr/?_ga=2.184185454.979594832.1547755516-1193527346.1457026171) and “registered replication reports” from the Association for Psychological Science (https://www.psychologicalscience.org/publications/replication) in relation to this concept.)

      Amrhein, Trafimow, and Greenland (2019) ask if results-blind publishing means that anything goes, and then answer affirmatively: “Everything should be published in some form if whatever we measured made sense before we obtained the data because it was connected in a potentially useful way to some research question.” Journal editors, they say, “should be proud about [their] exhaustive methods sections” and base their decisions about the suitability of a study for publication “on the quality of its materials and methods rather than on results and conclusions; the quality of the presentation of the latter is only judged after it is determined that the study is valuable based on its materials and methods.”

      A “variation on this theme is pre-registered replication, where a replication study, rather than the original study, is subject to strict pre-registration (e.g., Gelman 2015),” says Tong (2019). “A broader vision of this idea (Mogil and Macleod 2017) is to carry out a whole series of exploratory experiments without any formal statistical inference, and summarize the results by descriptive statistics (including graphics) or even just disclosure of the raw data. When results from this series of experiments converge to a single working hypothesis, it can then be subjected to a pre-registered, randomized and blinded, appropriately powered confirmatory experiment, carried out by another laboratory, in which valid statistical inference may be made.”

      Hurlbert, Levine, and Utts (2019) urge abandoning the use of “statistically significant” in all its forms and encourage journals to provide instructions to authors along these lines: “There is now wide agreement among many statisticians who have studied the issue that for reporting of statistical tests yielding p-values it is illogical and inappropriate to dichotomize the p-scale and describe results as ‘significant’ and ‘nonsignificant.’ Authors are strongly discouraged from continuing this never justified practice that originated from confusions in the early history of modern statistics.”

      Hurlbert, Levine, and Utts (2019) also urge that the ASA Statement on PValues and Statistical Significance “be sent to the editor-in-chief of every journal in the natural, behavioral and social sciences for forwarding to their respective editorial boards and stables of manuscript reviewers. That would be a good way to quickly improve statistical understanding and practice.” Kmetz (2019) suggests referring to the ASA statement whenever submitting a paper or revision to any editor, peer reviewer, or prospective reader. Hurlbert et al. encourage a “community grassroots effort” to encourage change in journal procedures.

      Campbell and Gustafson (2019) propose a statistical model for evaluating publication policies in terms of weighing novelty of studies (and the likelihood of those studies subsequently being found false) against pre-specified study power. They observe that “no publication policy will be perfect. Science is inherently challenging and we must always be willing to accept that a certain proportion of research is potentially false.”

      Statistics education will require major changes at all levels to move to a post “p < 0.05” world. Two papers in this special issue make a specific start in that direction (Maurer et al. 2019; Steel, Liermann, and Guttorp 2019), but we hope that volumes will be written on this topic in other venues. We are excited that, with support from the ASA, the US Conference on Teaching Statistics (USCOTS) will focus its 2019 meeting on teaching inference.

      The change that needs to happen demands change to editorial practice, to the teaching of statistics at every level where inference is taught, and to much more. However…

      5 It Is Going to Take Work, and It Is Going to Take Time

      If it were easy, it would have already been done, because as we have noted, this is nowhere near the first time the alarm has been sounded.

      Why is eliminating the use of p-values as a truth arbiter so hard? “The basic explanation is neither philosophical nor scientific, but sociologic; everyone uses them,” says Goodman (2019). “It’s the same reason we can use money. When everyone believes in something’s value, we can use it for real things; money for food, and p-values for knowledge claims, publication, funding, and promotion. It doesn’t matter if the p-value doesn’t mean what people think it means; it becomes valuable because of what it buys.”

      Goodman observes that statisticians alone cannot address the problem, and that “any approach involving only statisticians will not succeed.” He calls on statisticians to ally themselves “both with scientists in other fields and with broader based, multidisciplinary scientific reform movements. What statisticians can do within our own discipline is important, but to effectively disseminate or implement virtually any method or policy, we need partners.”

      “The loci of influence,” Goodman says, “include journals, scientific lay and professional media (including social media), research funders, healthcare payors, technology assessors, regulators, academic institutions, the private sector, and professional societies. They also can include policy or informational entities like the National Academies…as well as various other science advisory bodies across the government. Increasingly, they are also including non-traditional science reform organizations comprised both of scientists and of the science literate lay public…and a broad base of health or science advocacy groups…”

      It is no wonder, then, that the problem has persisted for so long. And persist it has! Hubbard (2019) looked at citation-count data on twenty-five articles and books severely critical of the effect of null hypothesis significance testing (NHST) on good science. Though issues were well known, Hubbard says, this did nothing to stem NHST usage over time.

      Greenland (personal communication, January 25, 2019) notes that cognitive biases and perverse incentives to offer firm conclusions where none are warranted can warp the use of any method. “The core human and systemic problems are not addressed by shifting blame to p-values and pushing alternatives as magic cures—especially alternatives that have been subject to little or no comparative evaluation in either classrooms or practice,” Greenland said. “What we need now is to move beyond debating only our methods and their interpretations, to concrete proposals for elimination of systemic problems such as pressure to produce noteworthy findings rather than to produce reliable studies and analyses. Review and provisional acceptance of reports before their results are given to the journal (Locascio 2019) is one way to address that pressure, but more ideas are needed since review of promotions and funding applications cannot be so blinded. The challenges of how to deal with human biases and incentives may be the most difficult we must face.” Supporting this view is McShane and Gal’s (20162017) empirical demonstration of cognitive dichotomization errors among biomedical and social science researchers—and even among statisticians.

      Challenges for editors and reviewers are many. Here’s an example: Fricker et al. (2019) observed that when p-values were suspended from the journal Basic and Applied Social Psychology authors tended to overstate conclusions.

      With all the challenges, how do we get from here to there, from a “p < 0.05” world to a post “p < 0.05” world?

      Matthews (2019) notes that “Any proposal encouraging changes in inferential practice must accept the ubiquity of NHST.…Pragmatism suggests, therefore, that the best hope of achieving a change in practice lies in offering inferential tools that can be used alongside the concepts of NHST, adding value to them while mitigating their most egregious features.”

      Benjamin and Berger (2019) propose three practices to help researchers during the transition away from use of statistical significance. “…[O]ur goal is to suggest minimal changes that would require little effort for the scientific community to implement,” they say. “Motivating this goal are our hope that easy (but impactful) changes might be adopted and our worry that more complicated changes could be resisted simply because they are perceived to be too difficult for routine implementation.”

      Yet there is also concern that progress will stop after a small step or two. Even some proponents of small steps are clear that those small steps still carry us far short of the destination.

      For example, Matthews (2019) says that his proposed methodology “is not a panacea for the inferential ills of the research community.” But that doesn’t make it useless. It may “encourage researchers to move beyond NHST and explore the statistical armamentarium now available to answer the central question of research: what does our study tell us?” he says. It “provides a bridge between the dominant but flawed NHST paradigm and the less familiar but more informative methods of Bayesian estimation.”

      Likewise, Benjamin and Berger (2019) observe, “In research communities that are deeply attached to reliance on ‘p < 0.05,’ our recommendations will serve as initial steps away from this attachment. We emphasize that our recommendations are intended merely as initial, temporary steps and that many further steps will need to be taken to reach the ultimate destination: a holistic interpretation of statistical evidence that fully conforms to the principles laid out in the ASA Statement…”

      Yet, like the authors of this editorial, not all authors in this special issue support gradual approaches with transitional methods.

      Some (e.g., Amrhein, Trafimow, and Greenland 2019; Hurlbert, Levine, and Utts 2019; McShane et al. 2019) prefer to rip off the bandage and abandon use of statistical significance altogether. In short, no more dichotomizing p-values into categories of “significance.” Notably, these authors do not suggest banning the use of p-values, but rather suggest using them descriptively, treating them as continuous, and assessing their weight or import with nuanced thinking, clear language, and full understanding of their properties.

      So even when there is agreement on the destination, there is disagreement about what road to take. The questions around reform need consideration and debate. It might turn out that different fields take different roads.

      The catalyst for change may well come from those people who fund, use, or depend on scientific research, say Calin-Jageman and Cumming (2019). They believe this change has not yet happened to the desired level because of “the cognitive opacity of the NHST approach: the counter-intuitive p-value (it’s good when it is small), the mysterious null hypothesis (you want it to be false), and the eminently confusable Type I and Type II errors.”

      Reviewers of this editorial asked, as some readers of it will, is a p-value threshold ever okay to use? We asked some of the authors of articles in the special issue that question as well. Authors identified four general instances. Some allowed that, while p-value thresholds should not be used for inference, they might still be useful for applications such as industrial quality control, in which a highly automated decision rule is needed and the costs of erroneous decisions can be carefully weighed when specifying the threshold. Other authors suggested that such dichotomized use of p-values was acceptable in model-fitting and variable selection strategies, again as automated tools, this time for sorting through large numbers of potential models or variables. Still others pointed out that p-values with very low thresholds are used in fields such as physics, genomics, and imaging as a filter for massive numbers of tests. The fourth instance can be described as “confirmatory setting[s] where the study design and statistical analysis plan are specified prior to data collection, and then adhered to during and after it” (Tong 2019). Tong argues these are the only proper settings for formal statistical inference. And Wellek (2017) says at present it is essential in these settings. “[B]inary decision making is indispensable in medicine and related fields,” he says. “[A] radical rejection of the classical principles of statistical inference…is of virtually no help as long as no conclusively substantiated alternative can be offered.”

      Eliminating the declaration of “statistical significance” based on p < 0.05 or other arbitrary thresholds will be easier in some venues than others. Most journals, if they are willing, could fairly rapidly implement editorial policies to effect these changes. Suggestions for how to do that are in this special issue of The American Statistician. However, regulatory agencies might require longer timelines for making changes. The U.S. Food and Drug Administration (FDA), for example, has long established drug review procedures that involve comparing p-values to significance thresholds for Phase III drug trials. Many factors demand consideration, not the least of which is how to avoid turning every drug decision into a court battle. Goodman (2019) cautions that, even as we seek change, “we must respect the reason why the statistical procedures are there in the first place.” Perhaps the ASA could convene a panel of experts, internal and external to FDA, to provide a workable new paradigm. (See Ruberg et al. 2019, who argue for a Bayesian approach that employs data from other trials as a “prior” for Phase 3 trials.)

      Change is needed. Change has been needed for decades. Change has been called for by others for quite a while. So…

      6 Why Will Change Finally Happen Now?

      In 1991, a confluence of weather events created a monster storm that came to be known as “the perfect storm,” entering popular culture through a book (Junger 1997) and a 2000 movie starring George Clooney. Concerns about reproducible science, falling public confidence in science, and the initial impact of the ASA statement in heightening awareness of long-known problems created a perfect storm, in this case, a good storm of motivation to make lasting change. Indeed, such change was the intent of the ASA statement, and we expect this special issue of TAS will inject enough additional energy to the storm to make its impact widely felt.

      We are not alone in this view. “60+ years of incisive criticism has not yet dethroned NHST as the dominant approach to inference in many fields of science,” note Calin-Jageman and Cumming (2019). “Momentum, though, seems to finally be on the side of reform.”

      Goodman (2019) agrees: “The initial slow speed of progress should not be discouraging; that is how all broad-based social movements move forward and we should be playing the long game. But the ball is rolling downhill, the current generation is inspired and impatient to carry this forward.”

      So, let’s do it. Let’s move beyond “statistically significant,” even if upheaval and disruption are inevitable for the time being. It’s worth it. In a world beyond “p < 0.05,” by breaking free from the bonds of statistical significance, statistics in science and policy will become more significant than ever.

      7 Authors’ Suggestions

      The editors of this special TAS issue on statistical inference asked all the contact authors to help us summarize the guidance they provided in their papers by providing us a short list of do’s. We asked them to be specific but concise and to be active—start each with a verb. Here is the complete list of the authors’ responses, ordered as the papers appear in this special issue.

      7.1 Getting to a Post “p < 0.05” Era

      Ioannidis, J., What Have We (Not) Learnt From Millions of Scientific Papers With p-Values?

      1. Do not use p-values, unless you have clearly thought about the need to use them and they still seem the best choice.
      2. Do not favor “statistically significant” results.
      3. Do be highly skeptical about “statistically significant” results at the 0.05 level.

      Goodman, S., Why Is Getting Rid of pValues So Hard? Musings on Science and Statistics

      1. Partner with science reform movements and reformers within disciplines, journals, funding agencies and regulators to promote and reward reproducible science and diminish the impact of statistical significance on publication, funding and promotion.
      2. Speak to and write for the multifarious array of scientific disciplines, showing how statistical uncertainty and reasoning can be conveyed in non-“bright-line” ways both with conventional and alternative approaches. This should be done not just in didactic articles, but also in original or reanalyzed research, to demonstrate that it is publishable.
      3. Promote, teach and conduct meta-research within many individual scientific disciplines to demonstrate the adverse effects in each of over-reliance on and misinterpretation of p-values and significance verdicts in individual studies and the benefits of emphasizing estimation and cumulative evidence.
      4. Require reporting a quantitative measure of certainty—a “confidence index”—that an observed relationship, or claim, is true. Change analysis goal from achieving significance to appropriately estimating this confidence.
      5. Develop and share teaching materials, software, and published case examples to help with all of the do’s above, and to spread progress in one discipline to others.

      Hubbard, R., Will the ASA’s Efforts to Improve Statistical Practice be Successful? Some Evidence to the Contrary

      This list applies to the ASA and to the professional statistics community more generally.

      1. Specify, where/if possible, those situations in which the p-value plays a clearly valuable role in data analysis and interpretation.
      2. Contemplate issuing a statement abandoning the use of p-values in null hypothesis significance testing.

      Kmetz, J., Correcting Corrupt Research: Recommendations for the Profession to Stop Misuse of pValues

      1. Refer to the ASA statement on p-values whenever submitting a paper or revision to any editor, peer reviewer, or prospective reader. Many in the field do not know of this statement, and having the support of a prestigious organization when authoring any research document will help stop corrupt research from becoming even more dominant than it is.
      2. Train graduate students and future researchers by having them reanalyze published studies and post their findings to appropriate websites or weblogs. This practice will benefit not only the students, but will benefit the professions, by increasing the amount of replicated (or nonreplicated) research available and readily accessible, and as well as reformer organizations that support replication.
      3. Join one or more of the reformer organizations formed or forming in many research fields, and support and publicize their efforts to improve the quality of research practices.
      4. Challenge editors and reviewers when they assert that incorrect practices and interpretations of research, consistent with existing null hypothesis significance testing and beliefs regarding p-values, should be followed in papers submitted to their journals. Point out that new submissions have been prepared to be consistent with the ASA statement on p-values.
      5. Promote emphasis on research quality rather than research quantity in universities and other institutions where professional advancement depends heavily on research “productivity,” by following the practices recommended in this special journal edition. This recommendation will fall most heavily on those who have already achieved success in their fields, perhaps by following an approach quite different from that which led to their success; whatever the merits of that approach may have been, one objectionable outcome of it has been the production of voluminous corrupt research and creation of an environment that promotes and protects it. We must do better.

      Hubbard, D., and Carriquiry, A., Quality Control for Scientific Research: Addressing Reproducibility, Responsiveness and Relevance

      1. Compute and prominently display the probability the hypothesis is true (or a probability distribution of an effect size) or provide sufficient information for future researchers and policy makers to compute it.
      2. Promote publicly displayed quality control metrics within your field—in particular, support tracking of reproduction studies and computing the “level 1” and even “level 2” priors as required for #1 above.
      3. Promote a byline status for researchers who reproduce studies: Digital versions are dynamically updated to display “Reproduced by….” below original research authors’ names or “Not yet reproduced” until it is reproduced.

      Brownstein, N., Louis, T., O’Hagan, A., and Pendergast, J., The Role of Expert Judgment in Statistical Inference and Evidence-Based Decision-Making

      1. Staff the study team with members who have the necessary knowledge, skills and experience—statistically, scientifically, and otherwise.
      2. Include key members of the research team, including statisticians, in all scientific and administrative meetings.
      3. Understand that subjective judgments are needed in all stages of a study.
      4. Make all judgments as carefully and rigorously as possible and document each decision and rationale for transparency and reproducibility.
      5. Use protocol-guided elicitation of judgments.
      6. Statisticians specifically should:
        • Refine oral and written communication skills.
        • Understand their multiple roles and obligations as collaborators.
        • Take an active leadership role as a member of the scientific team; contribute throughout all phases of the study.
        • Co-own the subject matter—understand a sufficient amount about the relevant science/policy to meld statistical and subject-area expertise.
        • Promote the expectation that your collaborators co-own statistical issues.
        • Write a statistical analysis plan for all analyses and track any changes to that plan over time.
        • Promote co-responsibility for data quality, security, and documentation.
        • Reduce unplanned and uncontrolled modeling/testing (HARK-ing, p-hacking); document all analyses.

      O’Hagan, A., Expert Knowledge Elicitation: Subjective but Scientific

      1. Elicit expert knowledge when data relating to a parameter of interest is weak, ambiguous or indirect.
      2. Use a well-designed protocol, such as SHELF, to ensure expert knowledge is elicited in as scientific and unbiased a way as possible.

      Kennedy-Shaffer, L., Before p < 0.05 to Beyond p < 0.05: Using History to Contextualize p-Values and Significance Testing

      1. Ensure that inference methods match intuitive understandings of statistical reasoning.
      2. Reduce the computational burden for nonstatisticians using statistical methods.
      3. Consider changing conditions of statistical and scientific inference in developing statistical methods.
      4. Address uncertainty quantitatively and in ways that reward increased precision.

      Hubbard, R., Haig, B. D., and Parsa, R. A., The Limited Role of Formal Statistical Inference in Scientific Inference

      1. Teach readers that although deemed equivalent in the social, management, and biomedical sciences, formal methods of statistical inference and scientific inference are very different animals.
      2. Show these readers that formal methods of statistical inference play only a restricted role in scientific inference.
      3. Instruct researchers to pursue significant sameness (i.e., replicable and empirically generalizable results) rather than significant differences in results.
      4. Demonstrate how the pursuit of significant differences actively impedes cumulative knowledge development.

      McShane, B., Tackett, J., Böckenholt, U., and Gelman, A., Large Scale Replication Projects in Contemporary Psychological Research

      1. When planning a replication study of a given psychological phenomenon, bear in mind that replication is complicated in psychological research because studies can never be direct or exact replications of one another, and thus heterogeneity—effect sizes that vary from one study of the phenomenon to the next—cannot be avoided.
      2. Future large scale replication projects should follow the “one phenomenon, many studies” approach of the Many Labs project and Registered Replication Reports rather than the “many phenomena, one study” approach of the Open Science Collaboration project. In doing so, they should systematically vary method factors across the laboratories involved in the project.
      3. Researchers analyzing the data resulting from large scale replication projects should do so via a hierarchical (or multilevel) model fit to the totality of the individual-level observations. In doing so, all theoretical moderators should be modeled via covariates while all other potential moderators—that is, method factors—should induce variation (i.e., heterogeneity).
      4. Assessments of replicability should not depend solely on estimates of effects, or worse, significance tests based on them. Heterogeneity must also be an important consideration in assessing replicability.

      7.2 Interpreting and Using p

      Greenland, S., Valid p-Values Behave Exactly as They Should: Some Misleading Criticisms of p-Values and Their Resolution With s-Values

      1. Replace any statements about statistical significance of a result with the p-value from the test, and present the p-value as an equality, not an inequality. For example, if p = 0.03 then “…was statistically significant” would be replaced by “…had p = 0.03,” and “p < 0.05” would be replaced by “p = 0.03.” (An exception: If p is so small that the accuracy becomes very poor then an inequality reflecting that limit is appropriate; e.g., depending on the sample size, p-values from normal or χ2 approximations to discrete data often lack even 1-digit accuracy when p < 0.0001.) In parallel, if p = 0.25 then “…was not statistically significant” would be replaced by “…had p = 0.25,” and “p > 0.05” would be replaced by “p = 0.25.”
      2. Present p-values for more than one possibility when testing a targeted parameter. For example, if you discuss the p-value from a test of a null hypothesis, also discuss alongside this null p-value another p-value for a plausible alternative parameter possibility (ideally the one used to calculate power in the study proposal). As another example: if you do an equivalence test, present the p-values for both the lower and upper bounds of the equivalence interval (which are used for equivalence tests based on two one-sided tests).
      3. Show confidence intervals for targeted study parameters, but also supplement them with p-values for testing relevant hypotheses (e.g., the p-values for both the null and the alternative hypotheses used for the study design or proposal, as in #2). Confidence intervals only show clearly what is in or out of the interval (i.e., a 95% interval only shows clearly what has p > 0.05 or p ≤ 0.05), but more detail is often desirable for key hypotheses under contention.
      4. Compare groups and studies directly by showing p-values and interval estimates for their differences, not by comparing p-values or interval estimates from the two groups or studies. For example, seeing p = 0.03 in males and p = 0.12 in females does not mean that different associations were seen in males and females; instead, one needs a p-value and confidence interval for the difference in the sex-specific associations to examine the between-sex difference. Similarly, if an early study reported a confidence interval which excluded the null and then a subsequent study reported a confidence interval which included the null, that does not mean the studies gave conflicting results or that the second study failed to replicate the first study; instead, one needs a p-value and confidence interval for the difference in the study-specific associations to examine the between-study difference. In all cases, differences-between-differences must be analyzed directly by statistics for that purpose.
      5. Supplement a focal p-value p with its Shannon information transform (s-value or surprisal) s = –log2(p). This measures the amount of information supplied by the test against the tested hypothesis (or model): Rounded off, the s-value s shows the number of heads in a row one would need to see when tossing a coin to get the same amount of information against the tosses being “fair” (independent with “heads” probability of 1/2) instead of being loaded for heads. For example, if p = 0.03, this represents –log2(0.03) = 5 bits of information against the hypothesis (like getting 5 heads in a trial of “fairness” with 5 coin tosses); and if p = 0.25, this represents only –log2(0.25) = 2 bits of information against the hypothesis (like getting 2 heads in a trial of “fairness” with only 2 coin tosses).

      Betensky, R., The pValue Requires Context, Not a Threshold

      1. Interpret the p-value in light of its context of sample size and meaningful effect size.
      2. Incorporate the sample size and meaningful effect size into a decision to reject the null hypothesis.

      Anderson, A., Assessing Statistical Results: Magnitude, Precision and Model Uncertainty

      1. Evaluate the importance of statistical results based on their practical implications.
      2. Evaluate the strength of empirical evidence based on the precision of the estimates and the plausibility of the modeling choices.
      3. Seek out subject matter expertise when evaluating the importance and the strength of empirical evidence.

      Heck, P., and Krueger, J., Putting the p-Value in Its Place

      1. Use the p-value as a heuristic, that is, as the base for a tentative inference regarding the presence or absence of evidence against the tested hypothesis.
      2. Supplement the p-value with other, conceptually distinct methods and practices, such as effect size estimates, likelihood ratios, or graphical representations.
      3. Strive to embed statistical hypothesis testing within strong a priori theory and a context of relevant prior empirical evidence.

      Johnson, V., Evidence From Marginally Significant t-Statistics

      1. Be transparent in the number of outcome variables that were analyzed.
      2. Report the number (and values) of all test statistics that were calculated.
      3. Provide access to protocols for studies involving human or animal subjects.
      4. Clearly describe data values that were excluded from analysis and the justification for doing so.
      5. Provide sufficient details on experimental design so that other researchers can replicate the experiment.
      6. Describe only p-values less than 0.005 as being “statistically significant.”

      Fraser, D., The pValue Function and Statistical Inference

      1. Determine a primary variable for assessing the hypothesis at issue.
      2. Calculate its well defined distribution function, respecting continuity.
      3. Substitute the observed data value to obtain the “p-value function.”
      4. Extract the available well defined confidence bounds, confidence intervals, and median estimate.
      5. Know that you don’t have an intellectual basis for decisions.

      Rougier, J., pValues, Bayes Factors, and Sufficiency

      1. Recognize that behind every choice of null distribution and test statistic, there lurks a plausible family of alternative hypotheses, which can provide more insight into the null distribution.

      Rose, S., and McGuire, T., Limitations of pValues and R-Squared for Stepwise Regression Building: A Fairness Demonstration in Health Policy Risk Adjustment

      1. Formulate a clear objective for variable inclusion in regression procedures.
      2. Assess all relevant evaluation metrics.
      3. Incorporate algorithmic fairness considerations.

      7.3 Supplementing or Replacing p

      Blume, J., Greevy, R., Welty, V., Smith, J., and DuPont, W., An Introduction to Second Generation p-Values

      1. Construct a composite null hypothesis by specifying the range of effects that are not scientifically meaningful (do this before looking at the data). Why: Eliminating the conflict between scientific significance and statistical significance has numerous statistical and scientific benefits.
      2. Replace classical p-values with second-generation p-values (SGPV). Why: SGPVs accommodate composite null hypotheses and encourage the proper communication of findings.
      3. Interpret the SGPV as a high-level summary of what the data say. Why: Science needs a simple indicator of when the data support only meaningful effects (SGPV = 0), when the data support only trivially null effects (SGPV = 1), or when the data are inconclusive (0 < SGPV < 1).
      4. Report an interval estimate of effect size (confidence interval, support interval, or credible interval) and note its proximity to the composite null hypothesis. Why: This is a more detailed description of study findings.
      5. Consider reporting false discovery rates with SGPVs of 0 or 1. Why: FDRs gauge the chance that an inference is incorrect under assumptions about the data generating process and prior knowledge.

      Goodman, W., Spruill, S., and Komaroff, E., A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical Evidence Supporting Its Use

      1. Determine how far the true parameter’s value would have to be, in your research context, from exactly equaling the conventional, point null hypothesis to consider that the distance is meaningfully large or practically significant.
      2. Combine the conventional p-value criterion with a minimum effect size criterion to generate a two-criteria inference-indicator signal, which provides heuristic, but nondefinitive evidence, for inferring the parameter’s true location.
      3. Document the intended criteria for your inference procedures, such as a p-value cut-point and a minimum practically significant effect size, prior to undertaking the procedure.
      4. Ensure that you use the appropriate inference method for the data that are obtainable and for the inference that is intended.
      5. Acknowledge that every study is fraught with limitations from unknowns regarding true data distributions and other conditions that one’s method assumes.

      Benjamin, D., and Berger, J., Three Recommendations for Improving the Use of p-Values

      1. Replace the 0.05 “statistical significance” threshold for claims of novel discoveries with a 0.005 threshold and refer to p-values between 0.05 and 0.005 as “suggestive.”
      2. Report the data-based odds of the alternative hypothesis to the null hypothesis. If the data-based odds cannot be calculated, then use the p-value to report an upper bound on the data-based odds: 1/(-ep ln p).
      3. Report your prior odds and posterior odds (prior odds * data-based odds) of the alternative hypothesis to the null hypothesis. If the data-based odds cannot be calculated, then use your prior odds and the p-value to report an upper bound on your posterior odds: (prior odds) * (1/(-ep ln p)).

      Colquhoun, D., The False Positive Risk: A Proposal Concerning What to Do About pValues

      1. Continue to provide p-values and confidence intervals. Although widely misinterpreted, people know how to calculate them and they aren’t entirely useless. Just don’t ever use the terms “statistically significant” or “nonsignificant.”
      2. Provide in addition an indication of false positive risk (FPR). This is the probability that the claim of a real effect on the basis of the p-value is in fact false. The FPR (not the p-value) is the probability that your result occurred by chance. For example, the fact that, under plausible assumptions, observation of a p-value close to 0.05 corresponds to an FPR of at least 0.2–0.3 shows clearly the weakness of the conventional criterion for “statistical significance.”
      3. Alternatively, specify the prior probability of there being a real effect that one would need to be able to justify in order to achieve an FPR of, say, 0.05.

      Notes:

      There are many ways to calculate the FPR. One, based on a point null and simple alternative can be calculated with the web calculator at http://fpr-calc.ucl.ac.uk/. However other approaches to the calculation of FPR, based on different assumptions, give results that are similar (Table 1 in Colquhoun 2019).

      To calculate FPR it is necessary to specify a prior probability and this is rarely known. My recommendation 2 is based on giving the FPR for a prior probability of 0.5. Any higher prior probability of there being a real effect is not justifiable in the absence of hard data. In this sense, the calculated FPR is the minimum that can be expected. More implausible hypotheses would make the problem worse. For example, if the prior probability of there being a real effect were only 0.1, then observation of p = 0.05 would imply a disastrously high FPR = 0.76, and in order to achieve an FPR of 0.05, you’d need to observe p = 0.00045. Others (especially Goodman) have advocated giving likelihood ratios (LRs) in place of p-values. The FPR for a prior of 0.5 is simply 1/(1 + LR), so to give the FPR for a prior of 0.5 is simply a more-easily-comprehensible way of specifying the LR, and so should be acceptable to frequentists and Bayesians.

      Matthews, R., Moving Toward the Post p < 0.05 Era via the Analysis of Credibility

      1. Report the outcome of studies as effect sizes summarized by confidence intervals (CIs) along with their point estimates.
      2. Make full use of the point estimate and width and location of the CI relative to the null effect line when interpreting findings. The point estimate is generally the effect size best supported by the study, irrespective of its statistical significance/nonsignificance. Similarly, tight CIs located far from the null effect line generally represent more compelling evidence for a nonzero effect than wide CIs lying close to that line.
      3. Use the analysis of credibility (AnCred) to assess quantitatively the credibility of inferences based on the CI. AnCred determines the level of prior evidence needed for a new finding to provide credible evidence for a nonzero effect.
      4. Establish whether this required level of prior evidence is supported by current knowledge and insight. If it is, the new result provides credible evidence for a nonzero effect, irrespective of its statistical significance/nonsignificance.

      Gannon, M., Pereira, C., and Polpo, A., Blending Bayesian and Classical Tools to Define Optimal Sample-Size-Dependent Significance Levels

      1. Retain the useful concept of statistical significance and the same operational procedures as currently used for hypothesis tests, whether frequentist (Neyman–Pearson p-value tests) or Bayesian (Bayes-factor tests).
      2. Use tests with a sample-size-dependent significance level—ours is optimal in the sense of the generalized Neyman–Pearson lemma.
      3. Use a testing scheme that allows tests of any kind of hypothesis, without restrictions on the dimensionalities of the parameter space or the hypothesis. Note that this should include “sharp” hypotheses, which correspond to subsets of lower dimensionality than the full parameter space.
      4. Use hypothesis tests that are compatible with the likelihood principle (LP). They can be easier to interpret consistently than tests that are not LP-compliant.
      5. Use numerical methods to handle hypothesis-testing problems with high-dimensional sample spaces or parameter spaces.

      Pogrow, S., How Effect Size (Practical Significance) Misleads Clinical Practice: The Case for Switching to Practical Benefit to Assess Applied Research Findings

      1. Switch from reliance on statistical or practical significance to the more stringent statistical criterion of practical benefit for (a) assessing whether applied research findings indicate that an intervention is effective and should be adopted and scaled—particularly in complex organizations such as schools and hospitals and (b) determining whether relationships are sufficiently strong and explanatory to be used as a basis for setting policy or practice recommendations. Practical benefit increases the likelihood that observed benefits will replicate in subsequent research and in clinical practice by avoiding the problems associated with relying on small effect sizes.
      2. Reform statistics courses in applied disciplines to include the principles of practical benefit, and have students review influential applied research articles in the discipline to determine which findings demonstrate practical benefit.
      3. Recognize the need to develop different inferential statistical criteria for assessing the importance of applied research findings as compared to assessing basic research findings.
      4. Consider consistent, noticeable improvements across contexts using the quick prototyping methods of improvement science as a preferable methodology for identifying effective practices rather than on relying on RCT methods.
      5. Require that applied research reveal the actual unadjusted means/medians of results for all groups and subgroups, and that review panels take such data into account—as opposed to only reporting relative differences between adjusted means/medians. This will help preliminarily identify whether there appear to be clear benefits for an intervention.

      7.4 Adopting More Holistic Approaches

      McShane, B., Gal, D., Gelman, A., Robert, C., and Tackett, J., Abandon Statistical Significance

      1. Treat p-values (and other purely statistical measures like confidence intervals and Bayes factors) continuously rather than in a dichotomous or thresholded manner. In doing so, bear in mind that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures because they are, among other things, typically defined relative to the generally uninteresting and implausible null hypothesis of zero effect and zero systematic error.
      2. Give consideration to related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. Do this always—not just once some p-value or other statistical threshold has been attained—and do this without giving priority to p-values or other purely statistical measures.
      3. Analyze and report all of the data and relevant results rather than focusing on single comparisons that attain some p-value or other statistical threshold.
      4. Conduct a decision analysis: p-value and other statistical threshold-based rules implicitly express a particular tradeoff between Type I and Type II error, but in reality this tradeoff should depend on the costs, benefits, and probabilities of all outcomes.
      5. Accept uncertainty and embrace variation in effects: we can learn much (indeed, more) about the world by forsaking the false promise of certainty offered by dichotomous declarations of truth or falsity—binary statements about there being “an effect” or “no effect”—based on some p-value or other statistical threshold being attained.
      6. Obtain more precise individual-level measurements, use within-person or longitudinal designs more often, and give increased consideration to models that use informative priors, that feature varying treatment effects, and that are multilevel or meta-analytic in nature.

      Tong, C., Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science

      1. Prioritize effort for sound data production: the planning, design, and execution of the study.
      2. Build scientific arguments with many sets of data and multiple lines of evidence.
      3. Recognize the difference between exploratory and confirmatory objectives and use distinct statistical strategies for each.
      4. Use flexible descriptive methodology, including disciplined data exploration, enlightened data display, and regularized, robust, and nonparametric models, for exploratory research.
      5. Restrict statistical inferences to confirmatory analyses for which the study design and statistical analysis plan are pre-specified prior to, and strictly adhered to during, data acquisition.

      Amrhein, V., Trafimow, D., and Greenland, S., Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis If We Don’t Expect Replication

      1. Do not dichotomize, but embrace variation.

      (a)Report and interpret inferential statistics like the p-value in a continuous fashion; do not use the word “significant.”

      (b)Interpret interval estimates as “compatibility intervals,” showing effect sizes most compatible with the data, under the model used to compute the interval; do not focus on whether such intervals include or exclude zero.

      (c)Treat inferential statistics as highly unstable local descriptions of relations between models and the obtained data.

      (i)Free your “negative results” by allowing them to be potentially positive. Most studies with large p-values or interval estimates that include the null should be considered “positive,” in the sense that they usually leave open the possibility of important effects (e.g., the effect sizes within the interval estimates).

      (ii)Free your “positive results” by allowing them to be different. Most studies with small p-values or interval estimates that are not near the null should be considered provisional, because in replication studies the p-values could be large and the interval estimates could show very different effect sizes.

      (iii)There is no replication crisis if we don’t expect replication. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems such as failure to publish results in conflict with group expectations.

      Calin-Jageman, R., and Cumming, G., The New Statistics for Better Science: Ask How Much, How Uncertain, and What Else Is Known

      1. Ask quantitative questions and give quantitative answers.
      2. Countenance uncertainty in all statistical conclusions, seeking ways to quantify, visualize, and interpret the potential for error.
      3. Seek replication, and use quantitative methods to synthesize across data sets as a matter of course.
      4. Use Open Science practices to enhance the trustworthiness of research results.
      5. Avoid, wherever possible, any use of p-values or NHST.

      Ziliak, S., How Large Are Your G-Values? Try Gosset’s Guinnessometrics When a Little “p” Is Not Enough

      • G-10 Consider the Purpose of the Inquiry, and Compare with Best Practice. Falsification of a null hypothesis is not the main purpose of the experiment or observational study. Making money or beer or medicine—ideally more and better than the competition and best practice—is. Estimating the importance of your coefficient relative to results reported by others, is. To repeat, as the 2016 ASA Statement makes clear, merely falsifying a null hypothesis with a qualitative yes/no, exists/does not exist, significant/not significant answer, is not itself significant science, and should be eschewed.
      • G-9 Estimate the Stakes (Or Eat Them). Estimation of magnitudes of effects, and demonstrations of their substantive meaning, should be the center of most inquiries. Failure to specify the stakes of a hypothesis is the first step toward eating them (gulp).
      • G-8 Study Correlated Data: ABBA, Take a Chance on Me. Most regression models assume “iid” error terms—independently and identically distributed—yet most data in the social and life sciences are correlated by systematic, nonrandom effects—and are thus not independent. Gosset solved the problem of correlated soil plots with the “ABBA” layout, maximizing the correlation of paired differences between the As and Bs with a perfectly balanced chiasmic arrangement.
      • G-7 Minimize “Real Error” with the 3 R’s: Represent, Replicate, Reproduce. A test of significance on a single set of data is nearly valueless. Fisher’s p, Student’s t, and other tests should only be used when there is actual repetition of the experiment. “One and done” is scientism, not scientific. Random error is not equal to real error, and is usually smaller and less important than the sum of nonrandom errors. Measurement error, confounding, specification error, and bias of the auspices are frequently larger in all the testing sciences, agronomy to medicine. Guinnessometrics minimizes real error by repeating trials on stratified and balanced yet independent experimental units, controlling as much as possible for local fixed effects.
      • G-6 Economize with “Less is More”: Small Samples of Independent Experiments. Small sample analysis and distribution theory has an economic origin and foundation: changing inputs to the beer on the large scale (for Guinness, enormous global scale) is risky, with more than money at stake. But smaller samples, as Gosset showed in decades of barley and hops experimentation, does not mean “less than,” and Big Data is in any case not the solution for many problems.
      • G-5 Keep Your Eyes on the Size Matters/How Much? Question. There will be distractions but the expected loss and profit functions rule, or should. Are regression coefficients or differences between means large or small? Compared to what? How do you know?
      • G-4 Visualize. Parameter uncertainty is not the same thing as model uncertainty. Does the result hit you between the eyes? Does the study show magnitudes of effects across the entire distribution? Advances in visualization software continue to outstrip advances in statistical modeling, making more visualization a no brainer.
      • G-3 Consider Posteriors and Priors too (“It pays to go Bayes”). The sample on hand is rarely the only thing that is “known.” Subject matter expertise is an important prior input to statistical design and affects analysis of “posterior” results. For example, Gosset at Guinness was wise to keep quality assurance metrics and bottom line profit at the center of his inquiry. How does prior information fit into the story and evidence? Advances in Bayesian computing software make it easier and easier to do a Bayesian analysis, merging prior and posterior information, values, and knowledge.
      • G-2 Cooperate Up, Down, and Across (Networks and Value Chains). For example, where would brewers be today without the continued cooperation of farmers? Perhaps back on the farm and not at the brewery making beer. Statistical science is social, and cooperation helps. Guinness financed a large share of modern statistical theory, and not only by supporting Gosset and other brewers with academic sabbaticals (Ziliak and McCloskey 2008).
      • G-1 Answer the Brewer’s Original Question (“How should you set the odds?”). No bright-line rule of statistical significance can answer the brewer’s question. As Gosset said way back in 1904, how you set the odds depends on “the importance of the issues at stake” (e.g., the expected benefit and cost) together with the cost of obtaining new material.

      Billheimer, D., Predictive Inference and Scientific Reproducibility

      1. Predict observable events or quantities that you care about.
      2. Quantify the uncertainty of your predictions.

      Manski, C., Treatment Choice With Trial Data: Statistical Decision Theory Should Supplant Hypothesis Testing

      1. Statisticians should relearn statistical decision theory, which received considerable attention in the middle of the twentieth century but was largely forgotten by the century’s end.
      2. Statistical decision theory should supplant hypothesis testing when statisticians study treatment choice with trial data.
      3. Statisticians should use statistical decision theory when analyzing decision making with sample data more generally.

      Manski, C., and Tetenov, A., Trial Size for Near Optimal Choice between Surveillance and Aggressive Treatment: Reconsidering MSLT-II

      1. Statisticians should relearn statistical decision theory, which received considerable attention in the middle of the twentieth century but was largely forgotten by the century’s end.
      2. Statistical decision theory should supplant hypothesis testing when statisticians study treatment choice with trial data.
      3. Statisticians should use statistical decision theory when analyzing decision making with sample data more generally.

      Lavine, M., Frequentist, Bayes, or Other?

      1. Look for and present results from many models that fit the data well.
      2. Evaluate models, not just procedures.

      Ruberg, S., Harrell, F., Gamalo-Siebers, M., LaVange, L., Lee J., Price K., and Peck C., Inference and Decision-Making for 21st Century Drug Development and Approval

      1. Apply Bayesian paradigm as a framework for improving statistical inference and regulatory decision making by using probability assertions about the magnitude of a treatment effect.
      2. Incorporate prior data and available information formally into the analysis of the confirmatory trials.
      3. Justify and pre-specify how priors are derived and perform sensitivity analysis for a better understanding of the impact of the choice of prior distribution.
      4. Employ quantitative utility functions to reflect key considerations from all stakeholders for optimal decisions via a probability-based evaluation of the treatment effects.
      5. Intensify training in Bayesian approaches, particularly for decision makers and clinical trialists (e.g., physician scientists in FDA, industry and academia).

      van Dongen, N., Wagenmakers, E.J., van Doorn, J., Gronau, Q., van Ravenzwaaij, D., Hoekstra, R., Haucke, M., Lakens, D., Hennig, C., Morey, R., Homer, S., Gelman, A., and Sprenger, J., Multiple Perspectives on Inference for Two Simple Statistical Scenarios

      1. Clarify your statistical goals explicitly and unambiguously.
      2. Consider the question of interest and choose a statistical approach accordingly.
      3. Acknowledge the uncertainty in your statistical conclusions.
      4. Explore the robustness of your conclusions by executing several different analyses.
      5. Provide enough background information such that other researchers can interpret your results and possibly execute meaningful alternative analyses.

      7.5 Reforming Institutions: Changing Publication Policies and Statistical Education

      Trafimow, D., Five Nonobvious Changes in Editorial Practice for Editors and Reviewers to Consider When Evaluating Submissions in a Post P < 0.05 Universe

      1. Tolerate ambiguity.
      2. Replace significance testing with a priori thinking.
      3. Consider the nature of the contribution, on multiple levels.
      4. Emphasize thinking and execution, not results.
      5. Consider that the assumption of random and independent sampling might be wrong.

      Locascio, J., The Impact of Results Blind Science Publishing on Statistical Consultation and Collaboration

      For journal reviewers

      1. Provide an initial provisional decision regarding acceptance for publication of a journal manuscript based exclusively on the judged importance of the research issues addressed by the study and the soundness of the reported methodology. (The latter would include appropriateness of data analysis methods.) Give no weight to the reported results of the study per se in the decision as to whether to publish or not.
      2. To ensure #1 above is accomplished, commit to an initial decision regarding publication after having been provided with only the Introduction and Methods sections of a manuscript by the editor, not having seen the Abstract, Results, or Discussion. (The latter would be reviewed only if and after a generally irrevocable decision to publish has already been made.)

      For investigators/manuscript authors

      1. Obtain consultation and collaboration from statistical consultant(s) and research methodologist(s) early in the development and conduct of a research study.
      2. Emphasize the clinical and scientific importance of a study in the Introduction section of a manuscript, and give a clear, explicit statement of the research questions being addressed and any hypotheses to be tested.
      3. Include a detailed statistical analysis subsection in the Methods section, which would contain, among other things, a justification of the adequacy of the sample size and the reasons various statistical methods were employed. For example, if null hypothesis significance testing and p-values are used, presumably supplemental to other methods, justify why those methods apply and will provide useful additional information in this particular study.
      4. Submit for publication reports of well-conducted studies on important research issues regardless of findings, for example, even if only null effects were obtained, hypotheses were not confirmed, mere replication of previous results were found, or results were inconsistent with established theories.

      Hurlbert, S., Levine, R., and Utts, J., Coup de Grâce for a Tough Old Bull: “Statistically Significant” Expires

      1. Encourage journal editorial boards to disallow use of the phrase “statistically significant,” or even “significant,” in manuscripts they will accept for review.
      2. Give primary emphasis in abstracts to the magnitudes of those effects most conclusively demonstrated and of greatest import to the subject matter.
      3. Report precise p-values or other indices of evidence against null hypotheses as continuous variables not requiring any labeling.
      4. Understand the meaning of and rationale for neoFisherian significance assessment (NFSA).

      Campbell, H., and Gustafson, P., The World of Research Has Gone Berserk: Modeling the Consequences of Requiring “Greater Statistical Stringency” for Scientific Publication

      1. Consider the meta-research implications of implementing new publication/funding policies. Journal editors and research funders should attempt to model the impact of proposed policy changes before any implementation. In this way, we can anticipate the policy impacts (both positive and negative) on the types of studies researchers pursue and the types of scientific articles that ultimately end up published in the literature.

      Fricker, R., Burke, K., Han, X., and Woodall, W., Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban

      1. Use measures of statistical significance combined with measures of practical significance, such as confidence intervals on effect sizes, in assessing research results.
      2. Classify research results as either exploratory or confirmatory and appropriately describe them as such in all published documentation.
      3. Define precisely the population of interest in research studies and carefully assess whether the data being analyzed are representative of the population.
      4. Understand the limitations of inferential methods applied to observational, convenience, or other nonprobabilistically sampled data.

      Maurer, K., Hudiburgh, L., Werwinski, L., and Bailer J., Content Audit for p-Value Principles in Introductory Statistics

      1. Evaluate the coverage of p-value principles in the introductory statistics course using rubrics or other systematic assessment guidelines.
      2. Discuss and deploy improvements to curriculum coverage of p-value principles.
      3. Meet with representatives from other departments, who have majors taking your statistics courses, to make sure that inference is being taught in a way that fits the needs of their disciplines.
      4. Ensure that the correct interpretation of p-value principles is a point of emphasis for all faculty members and embedded within all courses of instruction.

      Steel, A., Liermann, M., and Guttorp, P., Beyond Calculations: A Course in Statistical Thinking

      1. Design curricula to teach students how statistical analyses are embedded within a larger science life-cycle, including steps such as project formulation, exploratory graphing, peer review, and communication beyond scientists.
      2. Teach the p-value as only one aspect of a complete data analysis.
      3. Prioritize helping students build a strong understanding of what testing and estimation can tell you over teaching statistical procedures.
      4. Explicitly teach statistical communication. Effective communication requires that students clearly formulate the benefits and limitations of statistical results.
      5. Force students to struggle with poorly defined questions and real, messy data in statistics classes.
      6. Encourage students to match the mathematical metric (or data summary) to the scientific question. Teaching students to create customized statistical tests for custom metrics allows statistics to move beyond the mean and pinpoint specific scientific questions.

      Gratefully,
      Ronald L. Wasserstein
      American Statistical Association, Alexandria, VA
      ron@amstat.org
      Allen L. Schirm
      Mathematica Policy Research (retired), Washington, DC
      allenschirm@gmail.com
      Nicole A. Lazar
      Department of Statistics, University of Georgia, Athens, GA
      nlazar@stat.uga.edu

      Related Research Data

      P value functions: An underused method to present research results and to promote quantitative reasoningSource: Wiley
      Age, sex and period estimates of Australia’s mental health over the last 17 yearsSource: SAGE Publications
      Best uses of p -values and complementary measures in medical research: Recent developments in the frequentist and Bayesian frameworksSource: Informa UK Limited
      Failing Grade: 89% of Introduction-to-Psychology Textbooks That Define or Explain Statistical Significance Do So IncorrectlySource: SAGE Publications
      Family Socioeconomic Status and Early Life Mortality Risk in the United StatesSource: Springer Science and Business Media LLC
      How to Teach Evidence-Based Practice in Social Work: A Systematic ReviewSource: SAGE Publications
      Insights into the Potential of the Atlantic Cod Gut Microbiome as Biomarker of Oil Contamination in the Marine EnvironmentSource: MDPI AG
      Is One Study as Good as Three? College Graduates Seem to Think So, Even if They Took Statistics ClassesSource: SAGE Publications
      Peritoneal mesothelioma and asbestos exposure: a population-based case–control study in Lombardy, ItalySource: BMJ
      The influence of extrinsic and intrinsic variables on children’s reading frequency and attitudes: An exploration using an artificial neural networkSource: SAGE Publications
      The p value wars (again)Source: Springer Science and Business Media LLC
      Toward Replicability With Confidence Intervals for the Exceedance ProbabilitySource: Informa UK Limited
      Wild pig (Sus scrofa L.) occupancy patterns in the Brazilian Atlantic forestSource: FapUNIFESP (SciELO)

      Linking provided by  

      Acknowledgments

      Without the help of a huge team, this special issue would never have happened. The articles herein are about the equivalent of three regular issues of The American Statistician. Thank you to all the authors who submitted papers for this issue. Thank you, authors whose papers were accepted, for enduring our critiques. We hope they made you happier with your finished product. Thank you to a talented, hard-working group of associate editors for handling many papers: Frank Bretz, George Cobb, Doug Hubbard, Ray Hubbard, Michael Lavine, Fan Li, Xihong Lin, Tom Louis, Regina Nuzzo, Jane Pendergast, Annie Qu, Sherri Rose, and Steve Ziliak. Thank you to all who served as reviewers. We definitely couldn’t have done this without you. Thank you, TAS Editor Dan Jeske, for your vision and your willingness to let us create this special issue. Special thanks to Janet Wallace, TAS editorial coordinator, for spectacular work and tons of patience. We also are grateful to ASA Journals Manager Eric Sampson for his leadership, and to our partners, the team at Taylor and Francis, for their commitment to ASA’s publishing efforts. Thank you to all who read and commented on the draft of this editorial. You made it so much better! Regina Nuzzo provided extraordinarily helpful substantive and editorial comments. And thanks most especially to the ASA Board of Directors, for generously and enthusiastically supporting the “p-values project” since its inception in 2014. Thank you for your leadership of our profession and our association.

      References

      References to articles in this special issue

      Other articles or books referenced

      Further reading 

      Information for

      Open access

      Help and info

      Keep up to date

      Register to receive personalised research and resources by emailSign me upTaylor and Francis Group Facebook pageTaylor and Francis Group Twitter pageTaylor and Francis Group Linkedin pageTaylor and Francis Group Youtube pageTaylor and Francis Group Weibo pageCopyright © 2021 Informa UK Limited Privacy policyCookiesTerms & conditionsAccessibility

      Registered in England & Wales No. 3099067
      5 Howick Place | London | SW1P 1WG

      Taylor and Francis Group

      https://s7.addthis.com/static/sh.f48a1a04fe8dbf021b4cda1d.html#rand=0.1721717266905889&iit=1618005062110&tmr=load%3D1618005054408%26core%3D1618005054467%26main%3D1618005062103%26ifr%3D1618005062116&cb=0&cdn=0&md=0&kw=&ab=-&dh=www.tandfonline.com&dr=https%3A%2F%2Fwww.nature.com%2F&du=https%3A%2F%2Fwww.tandfonline.com%2Fdoi%2Ffull%2F10.1080%2F00031305.2019.1583913&href=https%3A%2F%2Fwww.tandfonline.com%2Fdoi%2Ffull%2F10.1080%2F00031305.2019.1583913&dt=Moving%20to%20a%20World%20Beyond%20%E2%80%9Cp%E2%80%89%3C%E2%80%890.05%E2%80%9D&dbg=0&cap=tc%3D0%26ab%3D0&inst=1&jsl=8193&prod=undefined&lng=en&ogt=description%2Csite_name%2Cimage%2Curl%2Ctype%3Darticle%2Ctitle&pc=men&pub=xa-4faab26f2cff13a7&ssl=1&sid=6070cc3e1dec67de&srf=0.01&ver=300&xck=0&xtr=0&og=title%3DMoving%2520to%2520a%2520World%2520Beyond%2520%25E2%2580%259Cp%25E2%2580%2589%253C%25E2%2580%25890.05%25E2%2580%259D%26type%3Darticle%26url%3Dhttps%253A%252F%252Fwww.tandfonline.com%252Fdoi%252Fabs%252F10.1080%252F00031305.2019.1583913%26image%3Dhttps%253A%252F%252Fwww.tandfonline.com%252Fdoi%252Fcover-img%252F10.1080%252Futas20.v073.sup01%26site_name%3DTaylor%2520%2526%2520Francis%26description%3D(2019).%2520Moving%2520to%2520a%2520World%2520Beyond%2520%25E2%2580%259Cp%25E2%2580%2589%253C%25E2%2580%25890.05%25E2%2580%259D.%2520The%2520American%2520Statistician%253A%2520Vol.%252073%252C%2520Statistical%2520Inference%2520in%2520the%252021st%2520Century%253A%2520A%2520World%2520Beyond%2520p%2520%253C%25200.05%252C%2520pp.%25201-19.&csi=undefined&rev=v8.28.8-wp&ct=1&xld=1&xd=1ListenDictionaryTranslateFocusFocus

      https://www.nature.com/nature/for-authors/other-subs#correspondence

      Skip to main content

      Advertisement

      Nature

      Subscribe

      1. nature  
      2. for authors  
      3. other types of submissions

      Other types of submissions

      This document provides details of the other material that Nature publishes, in addition to original research Articles.

      Authors intending to contribute to any of these sections are advised to read the relevant section of published issues of Nature to gain an idea of which section is most suitable and how to present their work, and, if they have not published in one of these sections before, they must read the appropriate section guidelines below, before submission.

      Many Nature sections are commission-only and do not accept unsolicited contributions; where applicable this is stated in the section guidelines below. Nature editors cannot give details when declining unsolicited suggestions or contributions.

      All articles for all sections of Nature are considered according to our usual conditions of publication, including being subject to our embargo. All material is considered for publication on the understanding that it is original and that any similar or related material submitted or in press elsewhere is disclosed to Nature at submission.

      Authors of material submitted to any section of Nature must provide a current full postal address, phone, fax and e-mail address. It is helpful if authors note their surname and the section of Nature for which the article is being considered in the subject line of any e-mails they send to Nature.

      News and Commentary

      These sections are written and commissioned by Nature‘s editors. They do not contain unsolicited material. Information for use by Nature for these sections can be sent via e-mail, with the title of the section in the subject line. Please provide full address and contact details.

      Correspondence

      Correspondence items are ‘letters to the Editor’: brief comments on topical issues of public and political interest, anecdotal material, or readers’ reactions to informal material published in Nature (for example, Editorials, News reports, News Features, Books & Arts reviews, Comment pieces or Correspondence).

      Excluded contributions. The Correspondence section does not publish technical comments peer-reviewed research papers. Please submit these instead to Matters Arising (for author guidelines, see https://www.nature.com/nature/for-authors/matters-arising). Alternatively, please upload such comments at the foot of the Nature paper.

      We do not consider submissions responding to articles published in journals other than Nature.

      Correspondence submissions are only rarely peer-reviewed. Contributions that present primary research data are excluded

      Formatting. To be considered, correspondence submissions must be less than 250 words in length, with up to 3 references. Supplementary material is not permitted. Please include a link to the article under discussion and a full print citation, if applicable. 

      Signatories. A Correspondence can be signed by up to 3 people. The section is a forum for readers’ reactions, not for statements by organizations or groups of individuals. In exceptional cases where more than three signatories can be justified, all but the corresponding author will be listed online only. Please supply all correspondents’ postal and e-mail addresses, ORCHID IDs and telephone contact numbers, clearly indicating any accents on names or places.

      Note that multiple affiliations cannot be published; please highlight the preferred affiliation.

      If the corresponding author is likely to be away from e-mail in the 4 weeks after submission, please identify an alternative contact.

      Submission. Please send submissions to correspondence@nature.com. These may include 100 words outlining the pertinence of the correspondence. Please present the correspondence text in the main body of e-mail. Attachments are not accepted.

      Proofs. All accepted contributions are edited before publication for accuracy and accessibility. Titles are chosen by the editors. Proofs are sent by e-mail. Occasionally, letters must be cut after proofs have been sent. Nature will endeavour to ensure authors see these changes, but cannot guarantee it.

      Obituaries

      Unsolicited contributions are not accepted; this is a commission-only section covering a very small number of researchers of Nobel or equivalent global impact on science or society.

      Commentary

      Unsolicited contributions are not accepted. This is a commission-only section intended to be accessible and appealing to the whole global Nature readership, of all disciplines.

      Commentary pieces are generally agenda-setting, authoritative, informed and often provocative expert pieces calling for action on topical issues pertaining to scientific research and its political, ethical and social ramifications. They road-map a proposed solution in detail; they do not simply snapshot a problem.

      Alternatively comment pieces can be writerly historical narratives or conceptual or philosophical arguments of pressing contemporary relevance, told with authority, colour, vivacity and personal voice. These attempt to bring an original perspective before the widest readership, through erudite reasoning and telling examples.

      Books & Arts

      Unsolicited contributions are not accepted. The Books & Arts section of Nature publishes timely reviews of books, as well as art exhibitions, performances and cultural events of interest to leading scientists and policy makers; the section also runs comment pieces on trends in these matters. Reviews and articles are commissioned by Nature‘s Books & Arts Editor.

      To be considered for review, books or bound proofs must be sent at least 3 months prior to publication to the Books & Arts Assistant (Nature, The Campus, 4 Crinan Street, London N1 9XW, UK); details of arts events should be e-mailed at least six weeks in advance to naturebooks@nature.com.

      Futures

      Futures is the award-winning science-fiction section of Nature and it accepts unsolicited articles. Each Futures piece should be an entirely fictional, self-contained story of around 850–950 words in length, and the genre should, broadly speaking, be ‘hard’ (that is, ‘scientific’) SF rather than, say, outright fantasy, slipstream or horror. Each item should be sent as a Word document attachment to futures@nature.com, including full contact details and a 30-word autobiographical note to be appended to the story if published.

      We ask contributors not to send presubmission enquiries but to send the whole story. Unsolicited artwork is not considered. Before submitting, prospective authors are advised to read earlier Futures stories at nature.com/futures; selected examples are also available here. More detailed guidelines can be found here.

      News & Views

      These articles inform nonspecialist readers about new scientific advances, as reported in recently published papers (in Nature and elsewhere). This is a commission-only section.

      Review and Perspective articles

      Nature publishes two kinds of review, Review and Perspective articles.

      Most articles are commissioned, but authors wishing to submit an unsolicited Review or Perspective must do so through our online submission system.

      • The synopsis should outline the basic structure of the article; list the material to be covered with an indication of the proposed depth of coverage; and indicate how the material will be logically arranged.
      • The synopsis should be accompanied by a 300-500 word outline of the background to the topic which summarizes the progress made to date and should also make the case succinctly for publication in a topical, interdisciplinary journal.
      • Synopses prepared at this level of detail enable Nature’s editors to provide editorial input before they commission the article, and can reduce the need for substantial editorial revisions at a later stage.
      • The synopsis should include any very recent, key publications in the area.

      Reviews

      • They focus on one topical aspect of a field rather than providing a comprehensive literature survey.
      • They can be controversial, but in this case should briefly indicate opposing viewpoints. They should not be focused on the author’s own work. Language should be accessible, novel concepts defined and specialist terminology explained.
      • They are peer-reviewed, and are substantially edited by Nature’s editors in consultation with the author.
      • All Reviews start with a 200-word maximum preface, which should set the stage and end with a summary sentence. Please note that the preface will also appear on PubMed and Medline, so it is important that it contains essential key words.
      • Reviews vary in length depending on the topic and should not generally be more than 9 pages long. As a guideline, most reviews should include no more than 150 references. Display items and explanatory boxes (used for explanation of technical points or background material) are welcomed. As a guideline, 5000 words, 4 moderate display items (figures/tables/boxes) and a modest citation list (no more than 150 references) will occupy 9 pages.
      • Highlighted references: for Review and Perspective articles, please write a single sentence, in bold text, beneath each of what you consider to be the most important or relevant 5 to 10 per cent of the references in your list, to explain the significance of the work.
      • The author is responsible for ensuring that the necessary permission has been obtained for the re-use of any figures previously published elsewhere.

      Perspectives

      Perspective articles are intended to provide a forum for authors to discuss models and ideas from a personal viewpoint. They are more forward looking and/or speculative than Reviews and may take a narrower field of view. They may be opinionated but should remain balanced and are intended to stimulate discussion and new experimental approaches.

      Perspectives follow the same formatting guidelines as Reviews. Both are peer-reviewed and edited substantially by Nature’s editors in consultation with the author.

      Analysis

      These articles are published only occasionally. They do not report original data, but are review-based reports including a new analysis of existing data (typically large biological data sets such as genomes, microarrays and proteomics) that lead to a novel, exciting and arresting conclusion. They are peer-reviewed.

      Authors interested in submitting an Analysis should send a synopsis through our online submission system with ‘Analysis:’ inserted before the title.

      Careers

      The Careers section welcomes suggestions for articles, which can be sent by email to the editors at naturejobs@nature.com.

      Technology features

      These articles are news-style reports, and are published a few times a year to review techniques and technologies in fast-moving fields of research. For further information, contact techfeatures@nature.com.

      Outlooks

      Nature Outlooks are supplements to Nature that contain news, features and opinion written and commissioned by the Nature supplements editor. They do not contain unsolicited material. For further information, contact the Outlooks editor on nature@nature.com.

      Nature ISSN 1476-4687 (online)

      nature.com sitemap

      Nature portfolio

      Discover content

      Publishing policies

      Author & Researcher services

      Libraries & institutions

      Advertising & partnerships

      Career development

      Regional websites

      Legal & Privacy

      Springer Nature

      © 2021 Springer Nature Limited

      Leave a Reply

      Fill in your details below or click an icon to log in:

      WordPress.com Logo

      You are commenting using your WordPress.com account. Log Out /  Change )

      Google photo

      You are commenting using your Google account. Log Out /  Change )

      Twitter picture

      You are commenting using your Twitter account. Log Out /  Change )

      Facebook photo

      You are commenting using your Facebook account. Log Out /  Change )

      Connecting to %s