Here are a few paragraphs on the replication crisis for your reflection from the excellent 2020 book Science Fictions by Stuart Ritchie:
…Replication, then, has long been a key part of how science is supposed to work – and incidentally, it’s another of its social aspects, with results only being taken seriously after they’ve been corroborated by multiple observers. But somewhere along the way, between Boyle and modern academia, a great many scientists forgot about the importance of replication. In the collision of our Mertonian ideals with the realities of the scientific publication system – not to mention the realities of human nature – the ideals have proven the more fragile, leaving us with a scientific literature full of untrustworthy, unreliable, unreplicable studies that often do more to confuse than enlighten. In the next chapter, we’ll see just how untrustworthy, unreliable and unreplicable the scientific literature has become…
Ritchie, Stuart. Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth (p. 24). Henry Holt and Co… Kindle Edition.
As you might expect, the confluence of failed replications (like the priming studies) and bizarre results (like Bem’s paranormal discoveries), along with revelations of misrepresentation (like Zimbardo’s experiment) and fraud (like Stapel’s fake data) spooked psychologists. Just how many of the studies in their field, they wondered, could be trusted? To get an idea of how bad things were, they started banding together to run large-scale replications of prominent studies across multiple different labs. The highest profile of these involved a large consortium of scientists who chose 100 studies from three top psychology journals and tried to replicate them. The results, published in Science in 2015, made bitter reading: in the end, only 39 percent of the studies were judged to have replicated successfully.26 Another one of these efforts, in 2018, tried to replicate twenty-one social-science papers that had been published in the world’s top two general science journals, Nature and Science.27 This time, the replication rate was 62 per cent. Further collaborations that looked at a variety of different kinds of psychological phenomena found rates of 77 per cent, 54 per cent, and 38 per cent.28 Almost all of the replications, even where successful, found that the original studies had exaggerated the size of their effects. Overall, the replication crisis seems, with a snap of its fingers, to have wiped about half of all psychology research off the map.29
Maybe it’s not quite that bad, for two reasons. First, we would expect some results that really are solid to fail to replicate sometimes, merely due to bad luck.30 Second, some replications might have failed due to their being run with slight changes to the methodology from the original (though if a result is fragile enough that it disappears after minor modifications to the experiment, one might wonder how useful or meaningful it really is).31 For these reasons, it’s sometimes tricky to decide whether a finding is ‘replicable’ or not based just on one or two replication attempts. What’s more, the replication rate seems to differ across different areas of psychology: for example, in the 2015 Science paper, cognitive psychology (studies of memory, perception, language, and so on) did better than social psychology (which includes the sorts of metaphor-priming studies we saw above).32 In general, though, the effect on psychology has been devastating. This wasn’t just a case of fluffy, flashy research like priming and power posing being debunked: a great deal of far more ‘serious’ psychological research (like the Stanford Prison Experiment, and much else besides) was also thrown into doubt. And neither was it a matter of digging up some irrelevant antiques and performatively showing that they were bad – like when Pope Stephen VI, in the year 897, exhumed the corpse of one of his predecessors, Pope Formosus, and put it on trial (it was found guilty). The studies that failed to replicate continued to be routinely cited both by scientists and other writers: entire lines of research, and bestselling popular books, were being built on their foundation. ‘Crisis’ seems to be an apt description…
…Could the sheer complexity of the task make findings in psychology particularly untrustworthy, compared to other sciences?
There is something to this argument: many studies in psychology barely scratch the surface of the phenomena they’re interested in, while other ‘harder’ sciences – say, physics – have better- developed theories and more precise and genuinely objective measurements. But it’s not as if psychology is alone in having problems with replicability – although no other sciences have yet investigated their replication rates as systematically and in as much detail, there are glimmers of the same kinds of problems across very many different fields:
• In economics, a 2016 replication survey of eighteen microeconomic studies (not too different from psychology research, with people coming into the lab and taking part in experiments on their economic behaviour) had a replication rate of only 61 per cent.33
• In neuroscience, a study in 2018 found that standard studies in functional brain imaging, where the brain’s activity is recorded using MRI while the person is completing some kind of task (or just lying in the scanner), were likely only ‘modestly replicable’.34 The world of functional brain-imaging was also rocked by a paper which revealed that a default setting in a software package commonly used to analyse imaging data had a statistical error. It led to a vast number of accidental, uncorrected false-positive results, and it might have compromised around 10 per cent of all studies that had ever been published on the topic.35
• In evolutionary biology and ecology, a series of classic findings, repeated in textbooks and taught to generations of students, have fallen to replication attempts and critical reviews. For instance, the famous ‘domestication syndrome’, where Russian foxes that were selected for tameness started to take on the physical characteristics of domesticated species (like floppy ears and wider faces), turns out to have been hugely exaggerated, with most of the ‘domesticated’ traits existing before the selection even took place.36 And much of what we thought we knew about sexual selection in birds has been shot down by better evidence. For instance, despite what we thought we knew, putting a red band on male finches’ legs probably doesn’t make them super-attractive to females; male sparrows with larger patches of black plumage on their throats (their so-called ‘bib’) probably don’t have higher dominance in the flock; and the evidence that female blue tits are more attracted to particular plumage colours in males is inconclusive.37
• In marine biology, a massive replication attempt in 2020 found that the effects of ocean acidification (one of the consequences of climate change) on fish behaviour were non-existent.38 It thus failed to replicate several highly publicised studies from the previous decade that had apparently shown that more acidic conditions caused fish to become disoriented, and in some cases to swim towards, rather than away from, the chemical cues produced by predators.
• In organic chemistry, the journal Organic Syntheses, which operates an unusual policy where an editorial board member attempts to replicate in their own lab the results of every paper they receive as a submission, reported rejecting 7.5 per cent of submissions because of replication failures.39There are countless other examples: almost every case I’ll describe in this book involves a scientific ‘finding’ that, upon closer scrutiny, turned out to be either less solid than it seemed, or to be completely untrue. But more worryingly still, these examples are drawn just from the studies that have received that all-important scrutiny. These are just the ones we know about. How many other results, we must ask ourselves, would prove unreplicable if anyone happened to make the attempt?
One reason that we live in such uncertainty is that, as we learned from the Preface, hardly anyone runs replication studies. Though we don’t have the numbers for most fields, scans of the literature from certain subjects draw a bleak conclusion. In economics, a miserable 0.1 per cent of all articles published were attempted replications of prior results; in psychology, the number was better, but still nowhere near good, with an attempted replication rate of just over 1 per cent.40 If everyone is constantly marching onwards to new findings without stopping to check if our previous knowledge is robust, is the above list of replication failures that much of a surprise?
Here’s something that’s perhaps even more alarming. You’d think that if you obtained the exact same dataset as was used in a published study, you’d be able to derive the exact same results that the study reported. Unfortunately, in many subjects, researchers have had terrible difficulty with this seemingly straightforward task. This is a problem sometimes described as a question of reproducibility, as opposed to replicability (the latter term being usually reserved to mean studies that ask the same questions of different data)…
Ritchie, Stuart. Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth (p. 31). Henry Holt and Co… Kindle Edition.
Here is some further reading: ![]()
Replication crisis - Wikipedia
- Bastian H (5 December 2016). “Reproducibility Crisis Timeline: Milestones in Tackling Research Reliability”. Absolutely Maybe . Retrieved 2019-06-05.
- Bonett, D.G. (2021). Design and analysis of replication studies. Organizational Research Methods, 24, 513–529. https://doi.org/10.1177/1094428120911088
- Denworth L (October 2019). “A Significant Problem: Standard scientific methods are under fire. Will anything change?” (PDF). Scientific American. Vol. 321, no. 4. pp. 62–67. p. 63: The use of p values for nearly a century [since 1925] to determine statistical significance of experimental results has contributed to an illusion of certainty and [to] reproducibility crises in many scientific fields. There is growing determination to reform statistical analysis… Some [researchers] suggest changing statistical methods, whereas others would do away with a threshold for defining ‘significant’ results.
- Harris R (2017). Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions. New York: Basic Books. ISBN 9780465097906.
- Kafkafi N, Agassi J, Chesler EJ, Crabbe JC, Crusio WE, Eilam D, et al. (April 2018). “Reproducibility and replicability of rodent phenotyping in preclinical studies”. Neuroscience and Biobehavioral Reviews. 87: 218–232. doi:10.1016/j.neubiorev.2018.01.003. PMC 6071910. PMID 29357292.
- Ritchie S (July 2020). Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth. New York: Metropolitan Books. ISBN 9781250222695. Book Review (November 2020, The American Conservative)
- Whitfield J (October 2021). “Replication Crisis”. London Review of Books. 43 (19): 39–40. review of Ritchie S (July 2020). Science Fictions: Exposing Fraud, Negligence and Hype in Science. London: Bodley Head. ISBN 978-1-84792-565-7.
Renaldo