Experimental Philosophy and the Replication Crisis


The replication crisis in psychology and other fields, in which researchers have found it difficult or impossible to replicate the results of many earlier experiments (see the Reproducibility Project) is now being addressed by those working in experimental philosophy (x-phi), a subfield of philosophy that borrows surveying and experimental methods from psychology, whose work may suffer from similar problems.

The X-Phi Replicability Project enlisted 20 teams across 8 countries—over 40 researchers—to conduct replications of a “representative sample” of 40 x-phi studies, and has recently released its results. They found that x-phi studies “successfully replicated about 70% of the time.”

By way of comparison, the Reproducibility Project was able to replicate findings in only around 35% of a representative sample of psychology studies.

What explains the relatively high replication rate? The authors consider a number of explanations:

  • the effect sizes in x-phi, especially early on, were large, and it has been found that effect sizes are a good predictor of a study’s replicability
  • because x-phi studies are less costly to run and re-run, there is less of a downside to getting results that are not interesting enough  to publish, and so there is less motivation to engage in “questionable research practices”
  • the effects studied in x-phi are generally “less subtle” than those studied in psychology and more likely to be affected by factors under the control of the researchers
  • the academic culture of philosophy encourages researchers to be “more sensitive to certain methodological questions, such as what counts as strong evidence for a given claim,” or have “a greater tolerance for negative or null results.” More generally, for a few reasons, philosophers may be less susceptible than psychologists to the pressure of “publish or perish” when it comes to empirical studies.

You can read more about the results here.

(via Florian Cova)

Vivian Maier, “Infinite Reflection”

 

guest
14 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Wesley Buckwalter
Wesley Buckwalter
3 years ago

I think that while it’s likely a combination of things, one factor stand out, namely the benefit of a guiding foundation from centuries of theorizing in philosophy. Strikes me as a testament to progress that can be made with philosophically informed science and vice versa.Report

Nicolas Delon
Nicolas Delon
3 years ago

Kudos to the project—we should also note that the project would be no less admirable had it revealed that X Phi does worse than psych.

These are four plausible explanations. Could it also be that many—though far from all—of the replicated studies were designed as the replication crisis was already unfolding and causing some self-awareness among researchers? Not very confident in the hypothesis, but it could be that the fourth proposed explanation—philosophers’ methodological concern—was bolstered by the surrounding crisis. Report

Nick
Nick
3 years ago

This is truly helpful. From now on, when I hear an argument from an experimental philosopher, I can rationally multiply my credence in their empirical premises by 0.7.Report

Kenny Easwaran
Reply to  Nick
3 years ago

Probably a higher factor than you should use for most philosophical premises!Report

David Wallace
David Wallace
Reply to  Nick
3 years ago

You should rationally *set* your credence to 0.7, I think. And then Kenny Easwaran’s point looks pretty persuasive!Report

Marshall
Marshall
Reply to  Nick
3 years ago

So when we subject this the basis of this rule to itself, do we actually set our credence at .49 or at .79?Report

Edward Teach
Edward Teach
3 years ago

We’re interested in why x phi has higher replication than other fields. I’d say part of the explanation is that the results in this field are relatively low hanging fruit – since x phi is largely a recent endeavour, the ‘easy’ findings haven’t yet been picked up, in contrast to psychology where there’s been a century to conduct empirical studies.

To put it another way, I predict that in say 20 years time, there will be a lot less x phi results with significant effect sizesReport

wes
wes
Reply to  Edward Teach
3 years ago

And yet it is precisely many of the low hanging findings in psychology textbooks that don’t replicate..Report

Florian Cova
Florian Cova
3 years ago

@Edward Teach:
Yep. That’s actually a possibility we discuss in the paper, looking at data suggesting that effect sizes might indeed decrease with time.Report

afrie
afrie
3 years ago

https://experimentaleconreplications.com/

Experimental economics replication rate approx. 61%. I would have thought it would be higher, since experimental economists use monetary rewards to induce subjects to care/try.Report

Concerned xphi consumer
Concerned xphi consumer
3 years ago

I have some worries about the “cheapness” of running x-phi experiments. The authors pitch this as a benefit, since “null results that are clearly or at least plausibly due to weaknesses in the study design can be discarded without too much anguish” (p.38). I understand the rationale of discarding findings due to “weaknesses in the study design,” but this kind of rationale could easily be generated post-hoc to explain away inconvenient findings. This is exacerbated by the fact that so many of these studies are content-based, and allow for significant degrees of freedom on the part of the experimenter to tweak the design until they get the result they are hoping for. In other words, x-phi seems like it is prone to file-drawer problems. How many of these experiments are we *not* hearing about?
On a related note, what are pre-registration practices like in experimental philosophy? I imagine it probably wasn’t common in the early days, but perhaps more so now. Also, do philosophy journals that publish lots of x-phi have policies regarding pre-registration? If they don’t, they probably should…Report

Florian Cova
Florian Cova
Reply to  Concerned xphi consumer
3 years ago

I think this is a fair worry, but that kind of self-counters. Because running XPhi studies is cheap, other people/detractors will also have an easy time running studies showing that prior studies are biased by “tweaking”. I think that this kind of discussion is basically what has driven most XPhi on free will in the beginning (“you use only concrete vignettes”, “you only obtain your results because your description of determinism is confusing”, etc.)

As for the pre-registration policies, I don’t know of any policy in philosophy journals yet, so this is still left open to individuals. As for me, I now pre-register most of my studies, and those that are not pre-registered are still made public after they have been run.Report

Concerned xphi consumer
Concerned xphi consumer
Reply to  Florian Cova
3 years ago

Florian, thanks for your reply. You may be right that such self-correction occurs in some cases. But hoping that experimental philosophers – many of whom regularly collaborate with one another – will simply police themselves seems like a rather inefficient way to address the problem I’m raising. I think the real solution is rigorous transparency on the part of researchers, in the form of pre-registration practices and the open dissemination of null results. It’s great that you already engage in such practices, and I hope that your co-authors do as well. Report

Michael Flynn
Michael Flynn
2 years ago

A 70% replication rate is an objective failure. The P-values used to support or reject null hypotheses are consistent with a 95% replication rate, if the results are externally valid.
Yes this X-Phi replication study showed a much higher relative replication rate when compared to a replication study in psychology. However there were different sampling techniques used in both replication studies. A tentative conclusion to have drawn is that replication crisis in psychology may be worse than X-Phi, but the poor replication rates indicate that both fall short of the statistical and ethical standards expected of experimental science.
Report