What are pre-registrations good for? (Absolutely nothing!?) A quick review of some relevant references and results.

Andreas Ortmann
9 min readJan 15, 2024

--

In Data Colada [115], Uri Simonsohn posted in late 2023 a piece on pre-registration prevalence. He motivated it such:

Pre-registration is the best and possibly only solution to p-hacking. Ten years ago, pre-registrations were virtually unheard of in psychology, but they have become increasingly common since then. I was curious just how common they have become, and so I collected some data. This post shares the results.

Here are the results:

The arguments for (and against) pre-registration, and registered reports for that matter, have been laid out in Logg & Dorison (OBHDP 2021) and Chambers & Tzavella (NHB 2022).

About the same time Simonsohn updated us on the prevalence of pre-registration in psychology journals, we learned about a study titled “Preregistration in practice: A comparison of preregistered and non‑preregistered studies in psychology” in which the authors (van den Akker et al, 2023) find:

Overall, our data indicate that preregistration has beneficial effects in the realm of statistical power and impact, but we did not find robust evidence that preregistration prevents p-hacking and HARKing (Hypothesizing After the Results are Known). (from the abstract of their paper)

So much for pre-registration being the best and possibly only solution to p-hacking, no?!

It certainly is a surprising result and a result that contradicts both (my) intuition — when you impose constraints such as pre-registration then it seems self-evident that p-hacking and HARKing will be reduced — and also empirical findings. For example, Scheel, Schijen, & Lakens (2021), comparing the standard psychology literature with registered reports, found an excess of false positives and quite dramatically so:

We compared the results in published RRs (N = 71 as of November 2018) with a random sample of hypothesis-testing studies from the standard literature (N = 152) in psychology. Analyzing the first hypothesis of each article, we found 96% positive results in standard reports but only 44% positive results in RRs. (from the abstract of their paper)

This seems very much in line with (my) intuition.

Granted, registered reports are not the same as pre-registration. The latter is, however, a necessary condition for registered reports which pair pre-registration with peer review and the decision to publish before results are known.

Relatedly, Kvarven et al. (NHB 2019, correction 2020), comparing meta-analyses and preregistered multiple-laboratory replication projects, find:

The multiple-laboratory replications provide precisely estimated effect sizes that do not suffer from publication bias or selective reporting. We searched the literature and identified 15 meta-analyses on the same topics as multiple-laboratory replications. We find that meta-analytic effect sizes are significantly different from replication effect sizes for 12 out of the 15 meta-replication pairs. These differences are systematic and, on average, meta-analytic effect sizes are almost three times as large as replication effect sizes. (from their original abstract)

The van den Akker et al results are thus a puzzle and, as the authors write themselves, results that came “unexpectedly” (see conclusion and discussion) It is likely that selection bias confounds their results. They report, for example, that pre-registered studies “more often contained a power analysis and larger sample sizes than non-preregistered studies. … and that pre-registered studies had a greater impact in terms of citations, Altmetric Attention Score, and journal impact factor than non-preregistered studies.” (from conclusion and discussion section)

Plus there is the interesting question of how you determine reliably for non-preregistered studies that the degree to which p-hacking and HARKing did indeed happen.

Update 2024_01_23: I just came across this paper by Brodeur et al.: Brodeur, Abel and Cook, Nikolai and Hartley, Jonathan and Heyes, Anthony, Do Pre-Registration and Pre-Analysis Plans Reduce p-Hacking and Publication Bias? (December 15, 2022).

Randomized controlled trials (RCTs) are increasingly prominent in economics, with pre-registration and pre-analysis plans (PAPs) promoted as important in ensuring the credibility of findings. We investigate whether these tools reduce the extent of p-hacking and publication bias by collecting and studying the universe of test statistics, 15,992 in total, from RCTs published in 15 leading economics journals from 2018 through 2021. In our primary analysis, we find no meaningful difference in the distribution of test statistics from pre-registered studies, compared to their non-pre-registered counterparts. However, pre-registered studies that have a complete PAP are significantly less p-hacked. These results point to the importance of PAPs, rather than pre-registration in itself, in ensuring credibility. (from their abstract)

This paper has now been published in the Journal of Political Economy — Micro.

As an aside, there’s a (widespread) misconception that pre-registration prevents you from reporting what you did not pre-register. Not so. See this recent pre-print:

PsyArXiv Preprints | When and How to Deviate from a Preregistration (osf.io)

The point of pre-registration is not to report only what you pre-register but to separate clearly what you did and what you did not. The latter therefore being flagged as somewhat exploratory and ex post.

I personally have pre-registered all my studies since 2019 or so. In my view the benefits outweigh the costs.

Update 2024_09_24: Today news broke of a high-visibility retraction in Nature Human Behavior.

This retraction had been coming for a long time; see the reference in the retraction notice.

Here are excerpts from Jessica Hullman’s 27 March 2024 contribution to Gelman’s blog Statistical Modelling etc.

… one of the questions raised by Bak-Coleman and Devezer about the published version was about their claim that all of the confirmatory analyses they present were preregistered. There was no such preregistration in sight if you checked the provided OSF link. I remarked back in November that even in the best case scenario where the missing preregistration was found, it was still depressing and ironic that a paper whose message is about the value of preregistration could make claims about its own preregistration that it couldn’t back up at publication time.

It seems clear that the dishonesty here was in service of telling a compelling story about something. I’ve seen things like this transpire plenty of times: the goal of getting published leads to attempts to find a good story in whatever results you got. Combined with the appearance of rigor and a good reputation, a researcher can be rewarded for work that on closer inspection involves so much post-hoc interpretation that the preregistration seems mostly irrelevant. It’s not surprising that the story here ends up being one that we would expect some of the authors to have faith in a priori.

What do I care? Why should you?

On some level, the findings the paper presents — that if you use large studies and attempt to eliminate QRPs, you can get a high rate of statistical significance — are very unsurprising. So why care if the analyses weren’t exactly decided in advance? Can’t we just call it sloppy labeling and move on?

I care because if deception is occurring openly in papers published in a respected journal for behavioral research by authors who are perceived as champions of rigor, then we still have a very long way to go. Interpreting this paper as a win for open science, as if it cleanly estimated the causal effect of rigor-enhancing practices is not, in my view, a win for open science. The authors’ lack of concern for labeling exploratory analysis as confirmatory, their attempt to spin the null findings from the intended study into a result about effects on replicability even though the definition they use is unconventional and appears to have been chosen because it led to a higher value, and the seemingly selective summary of prior replication rates from the literature should be acknowledged as the paper accumulates citations. At this point months have passed and there have not been any amendments to the paper, nor admission by the authors that the published manuscript makes false claims about the preregistration status. Why not just own up to it?

It’s frustrating because my own methodological stance has been positively impacted by some of these authors. I value what the authors call rigor-enhancing practices. In our experimental work, my students and I routinely use preregistration, we do design calculations via simulations to choose sample sizes, we attempt to be transparent about how we arrive at conclusions. I want to believe that these practices do work, and that the open science movement is dedicated to honesty and transparency. But if papers like the Nature Human Behavior article are what people have in mind when they laud open science researchers for their attempts to rigorously evaluate their proposals, then we have problems.

There are many lessons to be drawn here. When someone says all the analyses are preregistered, don’t just accept them at their word, regardless of their reputation. Another lesson that I think Andrew previously highlighted is that researchers sometimes form alliances with others that may have different views for the sake of impact but this can lead to compromised standards. Big collaborative papers where you can’t be sure what your co-authors are up to should make all of us nervous. Dishonesty is not worth the citations.

Some of the commentary on her blog entry is worth reading.

Update 2024_09_25: Here are Jessica Hullman’s rumination yesterday on the same blog. Worth a read.

Update 2024_10_22: The man himself, Andrew Gelman, provided on 26 September 2024, posted a long summary of the events titled “What’s they story behind that paper by the Center of Open Science team that just got retracted?” He makes the important point that “most science (and engineering) is all about theory and measurement, not procedure?” Procedure here refers to procedural reforms such as pre-registration, p-value or Bayes factor thresholds, and changes in the processes of scientific publication.

Some of the commentary on his blog entry is worth reading.

On 14 October 2024, Holly Else provides in nature her take on this situation which is titled ”’Doing good science is hard’: retraction of a high-profile reproducibility study prompts soul-searching. A paper by some of the biggest names in scientific integrity is retracted for issues including misstatements about the research plan.” Pre-gistration challenges feature prominently in her discussion.

In Data Colada [119], Uri Simonsohn discussed a hidden confound in a Psych Methods Pre-registrations Critique. The Psych Methods authors compared “the subjective codings and concluded that many pre-registrations in psychology, especially those relying on the AsPredicted template, provide insufficient information.” Given that I have used the AsPredicted template as my preferred pre-registration instrument that piqued my interest. It is of course also of some interest to Simonsohn as he has been the major driver behind the AsPredicted website.

Simonsohn identifies two issues with the Psych Methods paper. First, the authors do not clearly disclose (not to say: hide) that the samples of AsPredicted pre-registrations and OSF pre-registrations were constructed differently. Second, Simonsohn argues that pre-registrations should not be evaluated in isolation but always be read in the context of the paper for which they were written. The first problem is the more damaging in the present context. The second problem is though one that deserves continued discussion.

Simonsohn also discusses the AsPredicted pre-registration that the Psych Methods authors rate the worst and finds that it does actually put considerable constraints on authors (e.g., it allows readers to identify potential p-hacking or HARKing.) An interesting discussion.

Update 2024_11_25: On 19 November 2024 Joe B. Bak-Coleman returned to the fray, provoked by (defamatory) misrepresentations that he claimed Nosek made. Judge for yourself.

It would be great to drop this and not write about it ever again. But here’s the rub: We were asked by the editors to summarize the reason for retraction in our matters arising, and in misrepresenting why the paper was retracted the authors are publicly claiming my published work is incorrect. Here it is on display in Nature [screenshot]

Here, Nosek attributes the findings of the investigation to us. The investigation involved four domain experts, several editors, and a whole bunch of higher-ups in Springer Nature (as is policy). The result is that a very well known scientist is publicly asserting my work isn’t robust. Critically, they have offered precisely zero evidence to support their claims. Here, we’ll go through the relevant evidence that is publicly available. Instead, as best as I can tell, they are just lying. Given this has now led to misleading articles in multiple higher-profile venues, it’s probably time to set the record straight.

To be continued, for certain.

--

--

Andreas Ortmann
Andreas Ortmann

Written by Andreas Ortmann

EconProf: I post occasionally on whatever tickles my fancy: Science, evidence production, the Econ tribe, Oz politics, etc. Y’all r entitled to my opinions …

No responses yet