Philosophy’s Digital Future (guest post)


“The crucial question for any academic system is how filtering works. Information is cheap. What we want is some way to identify the most valuable information.”

In the following guest post, Richard Y. Chappell, associate professor of philosophy at the University of Miami, discusses how new technologies could facilitate better publication and research systems.

(A version of this post first appeared at Good Thoughts.)


Philosophy’s Digital Future:
How technology could transform academic research
by Richard Y. Chappell

Our current system for academic publishing strikes me as outdated. The ‘filter then publish’ model was designed for a non-digital world of high publication costs. Online publishing removes that constraint, enabling the shift to a superior ‘publish then filter’ model. What’s more: future advances in AI will make it easier to “map” our collective knowledge, identifying the most important contributions and highlighting gaps where more work is needed. Putting the two together yields a vision of a future academic system that seems far better suited to advancing our collective understanding than our current system.

Mapping the Literature

Imagine having access to an accurate synthesis of the academic literature, viewable at varying degrees of detail, mapping out everything from (a) the central positions in a debate, and the main arguments for and against each candidate position, to (z) the current status of the debate down to the n-th level of replies to replies to sub-objections. Such a comprehensive mapping would be far too much work for any human to do (though the high-level summaries of a debate offered in “survey” papers can be very helpful, they are inevitably far from complete, and may be tendentious). And current-generation LLMs don’t seem capable of reliably accurate synthesis. But presumably it’s just a matter of time. Within a decade or two (maybe much less), AIs could produce this mapping for us, situating (e.g.) every paper in the PhilPapers database according to its philosophical contributions and citation networks.

You could see at a glance where the main “fault lines” lie in a debate, and which objections remain unanswered. This opens up new ways to allocate professional esteem: incentivizing people to plug a genuine gap in the literature (or to generate entirely new branches), and not just whatever they can sneak past referees. This in turn could remedy the problem of neglected objections (and general lack of cross-camp engagement) that I’ve previously lamented, and encourage philosophical work that is more interesting and genuinely valuable.

Publish then filter

Suppose that your paper gets “added to the literature” simply by uploading it to PhilPapers. The PhilAI then analyzes it and updates the PhilMap accordingly. So far, no referees needed.

The crucial question for any academic system is how filtering works. Information is cheap. What we want is some way to identify the most valuable information: the papers of greatest philosophical merit (on any given topic) that are worth reading, assigning, and esteeming. Currently we rely on hyper-selective prestigious journals to do much of this filtering work for us, but I think they’re not very good at this task. Here I’ll suggest two forms of post-publication filtering that could better help us to identify worthwhile philosophy. (Though let me flag in advance that I’m more confident of the second.)

  1. PhilMap influence

Right now, the main numerical measure of influence is citation counts. But this is a pretty terrible metric: an offhand citation is extremely weak evidence of influence,1 and (in principle) a work could decisively settle a debate and yet secure no subsequent citations precisely because it was so decisive that there was nothing more to say.

An interesting question is whether the PhilAI could do a better job of measuring a contribution’s impact upon the PhilMap. One could imagine getting credit based upon measures of originality (being the first to make a certain kind of move in the debate), significance (productively addressing more central issues, rather than epicycles upon epicycles—unless, perhaps, a particular epicycle looked to be the crux of an entire debate), positive influence (like citation counts try to measure, but more contentful) and maybe even negative influence (if the AI can detect that a certain kind of “discredited” move is made less often following the publication of an article explaining why it is a mistake).

If the AI’s judgments are opaque, few may be inclined to defer to its judgments, at least initially. But perhaps it could transparently explain them. Or perhaps we would trust it more over time, as it amassed a reliable-seeming track record. Otherwise, if it’s no better than citation counts, we may need to rely more on human judgment (as we currently do). Still, there’s also room to improve our use of the latter, as per below.

  1. Crowdsourcing peer evaluation

This part doesn’t require AI, just suitable web design. Let anyone write a review of any paper in the database, or perhaps even submit ratings without comments.2 Give users options to filter or adjust ratings in various ways. Options could include, e.g., only counting professional philosophers, filtering by reviewer AOS, and calibrating for “grade inflation” (by adjusting downwards the ratings of those who routinely rate papers higher than other users do, and upwards for those who do the opposite) and “mutual admiration societies” (by giving less weight to reviews by philosophers that the author themselves tends to review unusually generously). Ease of adding custom filters (e.g. giving more weight to “reviewers like me” who share your philosophical tastes and standards) would provide users more options, over time, to adopt the evaluative filters that prove most useful.

Then iterate. Reviews are themselves philosophical contributions that can be reviewed and rated. Let authors argue with their reviewers, and try to explain why they think the other’s criticisms are misguided. Or take the critiques on board and post an updated version of the paper, marking the old review as applying to a prior version, and inviting the referee to (optionally) update their verdict of the current version. (Filters could vary in how much weight they give to “outdated” ratings that aren’t confirmed to still apply to new versions, possibly varying depending on how others’ ratings of the two versions compare, or on whether third parties mark the review as “outdated” or “still relevant”.) Either way, the process becomes more informative (and so, one hopes, likely more accurate).3

Instead of journals, anyone—or any group—can curate lists of “recommended papers”.4 The Journal of Political Philosophy was essentially just “Bob’s picks”, after all. There’s no essential reason for this curation role to be bundled with publication. As with journal prestige, curators would compete to develop reputations for identifying the best “diamonds in the rough” that others overlook. Those with the best track records would grow their followings over time, and skill in reviewing and curation—as revealed by widespread following and deference in the broader philosophical community—could be a source of significant professional esteem (like being a top journal editor today). Some kind of visible credit could go to the reviewers and curators who first signal-boost a paper that ends up being widely esteemed. (Some evaluative filters might seek to take into account reviewer track record in this way, giving less weight to those whose early verdicts sharply diverge—in either direction—from the eventual consensus verdicts.)

One could also introduce academic prediction markets (e.g. about how well-regarded a paper will be in X years time) to incentivize better judgments.

PhilMap Evaluative Filters

Combining these two big changes: users could then browse an AI-generated “map” of the philosophical literature, using their preferred evaluative filters to highlight the most “valuable” contributions to each debate—and finding the “cutting edges” to which they might be most interested in contributing. This could drastically accelerate philosophical progress, as the PhilMap would update much faster than our current disciplinary “conventional wisdom”. It could also help researchers to avoid re-inventing the wheel, focusing instead on areas where more work is truly needed. So there seem clear epistemic benefits on both the “production” and “consumption” sides.

Summary of benefits

  1. The entire system is free and open access.
  2. Users can more easily find whatever valuable work is produced, and understand the big-picture “state of the debate” at a glance.
  3. Valuable work is more likely to be produced, as researchers are given both (i) better knowledge of what contributions would be valuable, and (ii) better incentives to produce valuable work (since it is more likely to be recognized as such).
  4. A small number of gatekeepers can’t unilaterally prevent valuable new work from entering “the literature”. (They also can’t prevent bad new work. But there’s no real cost to that, as the latter is easily ignored.)
  5. It offers a more efficient review process, compared to the current system in which (i) papers might be reviewed by dozens of referees before finally being published or abandoned, and (ii) much of that reviewing work is wasted due to its confidential nature. My described system could solve the “refereeing crisis” (whereby too much work for too little reward currently results in undersupply of this vital academic work—and what is supplied is often of lower quality than might be hoped), thanks to its greater efficiency and publicity.5
  6. Disincentivizes overproduction of low-quality papers. If publication is cheap, it ceases to count for much.
  7. It pushes us towards a kind of pluralism of evaluative standards.6 Currently, publishing a lot in top journals seems the main “measure” of professional esteem. But this is a terrible measure (and I say this as someone who publishes a lot in top journals!). Philosophers vary immensely in their evaluative standards, and it would be better to have a plurality of evaluative metrics (or filters) that reflected this reality. Different departments might value different metrics/filters, reflecting different conceptions of what constitutes good philosophy. If this info were publicly shared, it could help improve “matching” within the profession, further improving job satisfaction and productivity, and reducing “search costs” from people moving around to try to find a place where they really fit.

Objections

Are there any downsides sufficient to outweigh these benefits?

  1. Incentivizing reviews

In response to a similar proposal from Heeson & Bright to shift to post-publication review, Hansson objects that “it is not obvious where that crowd [for crowd-sourced post-publication review] would come from”:

Anyone who has experience of editing knows how difficult it is to get scholars to review papers, even when they are prodded by editors. It is difficult to see how the number of reviews could increase in a system with no such prodding.

There is an obvious risk that the distribution of spontaneous post-publication reviews on sites for author-controlled publication will be very uneven. Some papers may attract many reviews, whereas others receive no reviews at all. It is also difficult to foresee what will happen to the quality of reviews. When you agree to review a paper for a journal in the current system, this is a commitment to carefully read and evaluate the paper as a whole and to point out both its positive and its negative qualities. It is not unreasonable to expect that spontaneous peer reviews in an author-controlled system will more often be brief value statements rather than thorough analyses of the contents.

An obvious solution would be to make submissions of one’s own work to the PhilMap cost a certain number of “reviewer credits”.7 Reviews of a particular paper might earn diminishing credits depending on how many reviews it has already secured. And they might be subject to further quality-adjustments, based on automatic AI analysis and/or meta-crowdsourced up/down votes. Perhaps to earn credits, you need to “commit” to writing a review of an especially substantive and thorough nature. It would be worth putting thought into the best way to develop the details of the system. But I don’t see any insuperable problems here. Further, I would expect review quality to improve significantly given the reputational stakes of having your name publicly attached. (Current referees have little incentive to read papers carefully, and it often shows.)

  1. Transition feasibility

Another worry is simply how to get from here to there. I think the AI-powered PhilMap could significantly help with that transition. Currently, most PhilPapers entries are traditional publications. The PhilMap doesn’t require changing that. But if/as more people (and institutions) started using evaluative filters other than mere journal prestige, the incentive to publish in a journal would be reduced in favor of directly submitting to the PhilMap. And I’d certainly never referee for a journal again once a sufficiently well-designed alternative of this sort was available: I’d much rather contribute to a public review system—I positively enjoy writing critical blog posts, after all! If enough others felt similarly, it’s hard to see how journals could survive the competition.

Of course, this all depends upon novel evaluative metrics/filters proving more valuable than mere journal prestige, inspiring people to vote with their feet. I think journals suck, so this shouldn’t be difficult. But if I’m wrong, the radical changes just won’t take off as hoped. So it seems pretty low-risk to try it and see.

  1. Other objections?

I’m curious to hear what other concerns one might have to the proposed system. There was some past discussion of Heeson & Bright’s proposal on Daily Nous, but I think my above discussion addresses the biggest concerns. I’ve also seen mention of a critical paper by Rowbottom, but my institution doesn’t provide access to the journal it’s in, and the author didn’t bother to post a pre-print to PhilPapers, so I can’t read their criticisms. (Further evidence that the current system is lousy!)


Notes

1. For example, my most-cited paper (on ‘Fittingness’) gets mentioned a lot in passing, but ~zero substantial engagement, whereas I get the sense that ‘Value Receptacles’ and ‘Willpower Satisficing’ have done a lot more to change how others actually think about their respective topics. (And, indeed, I think the latter two are vastly better papers.)

2. Either way, they should flag any potential conflicts of interest (e.g. close personal or professional connections to the author), and others should be able to raise flags when the reviewer themselves fails to do so. Mousing over the reviewer’s name could indicate relevant data about their track record, e.g. professional standing, average ratings that they give to others, etc.

3. Arvan, Bright, & Heesen argue that formal jury theorems support this conclusion. I’m dubious of placing muchweight on such arguments: too much depends on whether the background assumptions are actually satisfied. But their “replies to objections” section is worth reading!

4. As with reviewers, curators would need to flag any conflicts of interest (but could do whatever they want subject to offering that transparency).

5. The publicity might deter some grad students and precariously employed philosophers from offering critical reviews (e.g. of work by faculty who could conceivably be on their future hiring committee). But if fewer reviews are needed anyway, those from the securely employed may well suffice. The cowardly might also be mistaken in their assumptions: I’d expect good philosophers to think betterof candidates who can engage intelligently (even if critically!) with their work. (But who knows how many people on hiring committees actually meet my expectations for “good philosophers”. Reality may disappoint.)

A second effect of the publicity might be that everyone would be less inclined to write scathingly negative reviews, for fear of making enemies. But that’s probably a good thing. Scathing negative reports are often stupid, and would benefit from having the writers be careful of their reputations. It should always be possible to write an appropriately negative review in such a way as to cause no embarrassment from having one’s name attached to it.

Alternatively, the software might offer some way to anonymize one’s review (subject to checks to ensure that one isn’t abusing anonymity to hide a conflict of interests). Different evaluative filters might then vary in how much weight they give to anonymous vs. named reviews.

6. By this I mean a “descriptive” form of pluralism, i.e. about candidate You don’t have to think the standards are all equal; but you should probably expect other philosophers to disagree with your philosophical values. So I think it’s appropriate to have a plurality of candidate standards available, from which we can argue about which is actually best, rather than pretending that our current measure is actually reliably measuring anything in particular, let alone any shared conception of philosophical merit. (Maybe it generates a shared sense of social statusor prestige, which we all then value. But I take that to be a bad thing. It would be better for different subgroups to esteem different philosophers, who better merit it by the locally accepted standards. And for all this to be more transparent.)

7. If we want to reduce the pressure on grad students and the tenuously employed, they could be awarded a limited number of free credits each year, allowing them to submit more and review less. Conversely, the price per submission for senior faculty could increase, reflecting expectations that tenured faculty should shoulder more of the reviewing “burden”.


Related: “‘Hey Sophi’, or How Much Philosophy Will Computers Do?

Subscribe
Notify of
guest

9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
James Lincoln
James Lincoln
10 days ago

One general concern about AI development is the perpetuation of bias based on the biases of the things that LLMs train on. Granted, the current model for academic publishing reflects more in common with gatekeeping than in epistemic growth in some important ways, but how do you see the PhilMap model or the publish then filter model being impacted by structural inequities in AI LLMs? Even if we overcome them as best we can to be more inclusive of diverse content and methods, should we worry that outsourcing this kind of mapping process separates us too much from the landscape of our field?

Marc Champagne
10 days ago

The obsession with rankings is awful. The obsession with AI is awful. So, the idea of mixing AI and rankings is doubly awful.

Please think long and hard before ditching our current practices (preferably long enough for the drawbacks of AI to become known).

Richard Y Chappell
Reply to  Marc Champagne
10 days ago

Academia depends upon discernment of better vs worse work. I don’t think it’s either accurate or charitable to characterize this as an “obsession with rankings”. I do think it’s worth thinking about whether we can improve the ways we evaluate philosophical work, to better incentivize more philosophically valuable contributions.

I also don’t think it’s either accurate or charitable to dismiss all thinking about possible transformative uses of AI as an “obsession”. I think this stuff is worth thinking about — “long and hard”! — and welcome substantive objections. There are surely drawbacks to my proposal worth considering alongside the benefits that leapt to my attention, and I want to hear them. (But insulting dismissals are less helpful.)

cecil burrow
cecil burrow
Reply to  Richard Y Chappell
10 days ago

The distinction between better and worse work can be made without compiling rankings. Rankings are a relatively recent phenomenon, and the distinction between better and worse work is not.

Brian Weatherson
10 days ago

I’m rather perplexed by alleged benefit 6. We know how these kind of automated ranking systems work – we have 20+ years of experience with Google. And they definitely do not lead to less production of material for the network.

Richard Y Chappell
Reply to  Brian Weatherson
10 days ago

I don’t see much commonality between my proposal and Google pagerank. A major incentive for random websites is advertising revenue. What are you imagining is the corresponding incentive for academic overproduction? I take it to be all about professional incentives: journal publications are currently recognized as an important form of academic achievement. Self-published papers, like blog posts, get no such automatic recognition. Some further validation is needed. If it’s harder for low-quality papers to get any kind of validation, that would reduce the incentive to produce them.

But maybe it wouldn’t be harder. Maybe some relatively lax “evaluative filters” would end up being used by hiring and tenure committees, for example. It just depends on what incentives we end up collectively imposing on this structure. I was implicitly thinking that we would opt for higher standards, that favored quality over quantity. But it isn’t guaranteed, either way, by the nature of the system. It really is just up to us.

Daniel Weltman
10 days ago

I dunno how I feel about the AI stuff, but the crowdsourced public peer reviews on something like PhilPapers has struck me as a good idea for a while. For discussion on the topic see (among other places): https://philosopherscocoon.typepad.com/blog/2021/04/peer-review-keep-change-or-abolish.html

https://philosopherscocoon.typepad.com/blog/2018/07/changing-peer-review-as-good-a-time-as-any.html

Michel Xhignesse
10 days ago

I would like to start seeing refereeing systems that allow referees to take a more active role in deciding what to referee. Something like Dialectica’s fish pond, but for referees. (To be clear, I’d like to see this supplement, rather than replace, the current system of invitations to referee.) I referee a modest amount at present (about 10-15 a year), but I’d happily do more if I wasn’t just waiting for an invitation all the time.

I take it that would be consistent with (2), and I think it would strike a better balance than the usual “pre-print-crowd-source” solution that gets advocated.

grymes
grymes
10 days ago

I foresee a barrage of papers about whether PhilMap gets the terrain right.