“AI Is Getting Better at Mind-Reading” is how The New York Times puts it.
The actual study, “Semantic reconstruction of continuous language from non-invasive brain recordings” (by Jerry Tang, Amanda LeBel, Shailee Jain & Alexander G. Huth of the University of Texas at Austin), published in Nature Neuroscience, puts it this way:
We introduce a decoder that takes non-invasive brain recordings made using functional magnetic resonance imaging (fMRI) and reconstructs perceived or imagined stimuli using continuous natural language.
Ultimately, they developed a way to figure out what a person is thinking—or at least “the gist” of what they’re thinking—even if they’re not saying anything out loud, by looking at fMRI data.
It seems that such findings might have relevance to philosophers working across a range of areas.
The scientists used fMRIs to record blood-oxygen-level-dependent (BOLD) signals in their subjects’ brains as they listened to hours of podcasts (like The Moth Radio Hour and Modern Love) and watched animated Pixar shorts. They then had to “translate” the BOLD signals into natural language. One thing that made this challenging is that thoughts are faster than blood:
Although fMRI has excellent spatial specificity, the blood-oxygen-level-dependent (BOLD) signal that it measures is notoriously slow—an impulse of neural activity causes BOLD to rise and fall over approximately 10 s. For naturally spoken English (over two words per second), this means that each brain image can be affected by over 20 words. Decoding continuous language thus requires solving an ill-posed inverse problem, as there are many more words to decode than brain images. Our decoder accomplishes this by generating candidate word sequences, scoring the likelihood that each candidate evoked the recorded brain responses and then selecting the best candidate. [references removed]
They then trained an encoding model to compare word sequences to subjects’ brain responses. Using the data from their recordings of the subjects while they listened to the podcasts,
We trained the encoding model on this dataset by extracting semantic features that capture the meaning of stimulus phrases and using linear regression to model how the semantic features influence brain responses. Given any word sequence, the encoding model predicts how the subject’s brain would respond when hearing the sequence with considerable accuracy. The encoding model can then score the likelihood that the word sequence evoked the recorded brain responses by measuring how well the recorded brain responses match the predicted brain responses.
There was still the problem of too many word possibilities to feasibly work with, so they had to figure out a way to narrow down the translation options. To do that, they used a “generative neural network language model that was trained on a large dataset of natural English word sequences” to “restrict candidate sequences to well-formed English,” and a “beam search algorithm” to “efficiently search for the most likely word
When new words are detected based on brain activity in auditory and speech areas, the language model generates continuations for each sequence in the beam using the previously decoded words as context. The encoding model then scores the likelihood that each continuation evoked the recorded brain responses, and the… most likely continuations are retained in the beam for the next timestep.
They then had the subjects listen, while undergoing fMRI, to podcasts that the system was not trained on, to see whether it could decode the brain images into natural language that described what the subjects were thinking. The results:
The decoded word sequences captured not only the meaning of the stimuli but often even exact words and phrases, demonstrating that fine-grained semantic information can be recovered from the BOLD signal.
Here are some examples:
They then had the subjects undergo fMRI while merely imagining listening to one of the stories they had heard (again, not a story the system was trained on):
A key task for brain–computer interfaces is decoding covert imagined speech in the absence of external stimuli. To test whether our language decoder can be used to decode imagined speech, subjects imagined telling five 1-min stories while being recorded with fMRI and separately told the same stories outside of the scanner to provide reference transcripts. For each 1-min scan, we correctly identified the story that the subject was imagining by decoding the scan, normalizing the similarity scores between the decoder prediction and the reference transcripts into probabilities and choosing the most likely transcript (100% identification accuracy)… Across stories, decoder predictions were significantly more similar to the corresponding transcripts than expected by chance (P < 0.05, one-sided non-parametric test). Qualitative analysis shows that the decoder can recover the meaning of imagined stimuli.
The authors note that “subject cooperation is currently required both to train and to apply the decoder. However, future developments might enable decoders to bypass these requirements,” and call for caution:
even if decoder predictions are inaccurate without subject cooperation, they could be intentionally misinterpreted for malicious purposes. For these and other unforeseen reasons, it is critical to raise awareness of the risks of brain decoding technology and enact policies that protect each person’s mental privacy.
Note that this study describes research conducted over a year ago (the paper was just published, but it was submitted in April of 2022). Given the apparent pace of technological developments we’ve seen recently, has a year ago ever seemed so far in the past?