“We can apply scientific rigor to the assessment of AI consciousness, in part because… we can identify fairly clear indicators associated with leading theories of consciousness, and show how to assess whether AI systems satisfy them.”
In the following guest post, Jonathan Simon (Montreal) and Robert Long (Center for AI Safety) summarize their recent interdisciplinary report, “Consciousness in Artificial Intelligence: Insights from the Science of Consciousness“.
How to Tell Whether an AI is Conscious
by Jonathan Simon and Robert Long
Could AI systems ever be conscious? Might they already be? How would we know? These are pressing questions in the philosophy of mind, and they come up more and more in the public conversation as AI advances. You’ve probably read about Blake Lemoine, or about the time Ilya Sutskever, the chief scientist at OpenAI, tweeted that AIs might already be “slightly conscious”. The rise of AI systems that can convincingly imitate human conversation will likely cause many people to believe that the systems they interact with are conscious, whether they are or not. Meanwhile, researchers are taking inspiration from functions associated with consciousness in humans in efforts to further enhance AI capabilities.
Just to be clear, we aren’t talking about general intelligence, or moral standing: we are talking about phenomenal consciousness—the question of whether there is something it is like to be an instance of the system in question. Fish might be phenomenally conscious, but they aren’t generally intelligent, and it is debatable whether they have moral standing. Same here: it is possible that AI systems will be phenomenally conscious before they arrive at general intelligence or moral standing. That means artificial consciousness might be upon us soon, even if artificial general intelligence (AGI) is further off. And consciousness might have something to do with moral standing. So there are questions here that should be addressed sooner rather than later.
AI consciousness is thorny, between the hard problem, persistent lack of consensus about the neural basis of consciousness, and unclarity over what next year’s AI models will look like. If certainty is your game, you’d have to solve those problems first, so: game over.
For the report, we set a target lower than certainty. Instead we set out to find things that we can be reasonably confident about, on the basis of a minimal number of working assumptions. We settled on three working assumptions. First, we adopt computational functionalism, the claim that the thing our brains (and bodies) do to make us conscious is a computational thing—otherwise, there isn’t much point in asking about AI consciousness.
Second, we assume that neuroscientific theories are on the right track in general, meaning that some of the necessary conditions for consciousness that some of these theories identify really are necessary conditions for consciousness, and some collection of these may ultimately be sufficient (though we do not claim to have arrived at such a comprehensive list yet).
Third, we assume that our best bet for discovering substantive truths about AI consciousness is what Jonathan Birch calls a theory-heavy methodology, which in our context means we proceed by investigating whether AI systems perform functions similar to those that scientific theories associate with consciousness, then assigning credences based on (a) the similarity of the functions, (b) the strength of the evidence for the theories in question, and (c) one’s credence in computational functionalism. The main alternatives to this approach are either 1) to use behavioural or interactive tests for consciousness, which risks testing for the wrong things or 2) to look for markers typically associated with consciousness, but this method has pitfalls in the artificial case that it may not have in the case of animal consciousness.
Observe that you can accept these assumptions as a materialist, or as a non-materialist. We aren’t addressing the hard problem of consciousness, but rather what Anil Seth calls the “real” problem of saying which mechanisms of which systems—in this case, AI systems—are associated with consciousness.
With these assumptions in hand, our interdisciplinary team of scientists, engineers and philosophers set out to see what we could reasonably say about AI consciousness.
First, we made a (non-exhaustive) list of promising theories of consciousness. We decided to focus on five: recurrent processing theory, global workspace theory, higher- order theories, predictive processing, and attention schema theory. We also discuss unlimited associative learning and ‘midbrain’ theories that emphasize sensory integration, as well as two high-level features of systems that are often argued to be important for consciousness: embodiment and agency. Again, this is not an exhaustive list, but we aspired to cover a representative selection of promising scientific approaches to consciousness that are compatible with computational functionalism.
We then set out to identify, for each of these, a short list of indicator conditions: criteria that must be satisfied by a system to be conscious by the lights of that theory. Crucially, we don’t then attempt to decide between all of the theories. Rather, we build a checklist of all of the indicators from all of the theories, with the idea that the more boxes a given system checks, the more confident we can be that it is conscious, and likewise, the fewer boxes a system checks, the less confident we should be that it is conscious (compare: the methodology in Chalmers 2023).
We used this approach to ask two questions. First: are there any indicators that appear to be off-limits or impossible to implement in near future AI systems? Second, do any existing AI systems satisfy all of the indicators? We answer no to both questions. Thus, we find no obstacles to the existence of AI systems in the near future (though we offer no blueprints, and we do not identify a system in which all indicators would be jointly satisfied), but we also find no AI systems that check every box and would be classified as conscious according to all of the theories we consider.
In the space remaining, we’ll first summarize a sample negative finding: that current large language models like ChatGPT do not satisfy all indicators, and then we’ll discuss a few broader morals of the project.
We analyze several contemporary AI systems: Transformer-based systems such as large language models and PerceiverIO, as well as AdA, Palm-E, and a “virtual rodent”. While much recent focus on AI consciousness has understandably focused on large language models, this is overly narrow. Asking whether AI systems are conscious is rather like asking whether organisms are conscious. It will very likely depend on the system in question, so we must explore a range of systems.
While we find unchecked boxes for all of these systems, a clear example is our analysis of large language models based on the Transformer architecture like OpenAI’s GPT series. In particular we assess whether these models’ residual stream might amount to a global workspace. We find that it does not, because of an equivocation in how this would go: do we think of modules as confined to particular layers? Then indicator GWT-3 (see the above table) is unsatisfied. Do we think of modules as spread out over layers? Then there is no distinguishing the residual stream (i.e., the workspace) from the modules, and indicator GWT-1 is unsatisfied. Moreover, either way, indicator RPT-1 is unsatisfied. We then assess Perceiver and PerceiverIO architectures, finding that while they do better, they still fail to satisfy indicator GWT-3.
What are the morals of the story? We find a few: 1) we can apply scientific rigor to the assessment of AI consciousness, in part because 2) we can identify fairly clear indicators associated with leading theories of consciousness, and show how to assess whether AI systems satisfy them. And as far as substantive results go, 3) we found initial evidence that many of the indicator properties can be implemented in AI systems using current techniques, while also finding 4) that no current system satisfies all indicators and would be classified as conscious according to all of the theories we consider.
What about the moral morals of the story? All of our co-authors care about ethical issues raised by machines that are, or are perceived to be, conscious, and some of us have written about them. We do not advocate building conscious AIs, nor do we provide any new information about how one might do so: our results are primarily results of classification rather than of engineering. At the same time, we hope that our methodology and list of indicators contributes to more nuanced conversations, for example, by allowing us to more clearly distinguish the question of artificial consciousness from the question of artificial general intelligence, and to get clearer on which aspects or functions of consciousness are morally relevant.
 The full set of authors is: Patrick Butlin (Philosophy, Future of Humanity Institute, University of Oxford), Robert Long (Philosophy, Center for AI Safety), Eric Elmoznino (Cognitive neuroscience, Université de Montréal and MILA – Quebec AI Institute), Yoshua Bengio (Artifical Intelligence, Université de Montréal and MILA – Quebec AI Institute), Jonathan Birch (Consciousness science, Centre for Philosophy of Natural and Social Science, LSE), Axel Constant (Philosophy, School of Engineering and Informatics, The University of Sussex and Centre de Recherche en Éthique, Université de Montréal), George Deane (Philosophy, Université de Montréal), Stephen M. Fleming (Cognitive neuroscience, Department of Experimental Psychology and Wellcome Centre for Human Neuroimaging, University College London) Chris Frith (Neuroscience, Wellcome Centre for Human Neuroimaging, University College London and Institute of Philosophy, University of London) Xu Ji (Université de Montréal and MILA – Quebec AI Institute), Ryota Kanai (Consciousness science and AI, Araya, Inc.), Colin Klein (Philosophy, The Australian National University), Grace Lindsay (Computational neuroscience, Psychology and Center for Data Science, New York University), Matthias Michel (Consciousness science, Center for Mind, Brain and Consciousness, New York University), Liad Mudrik (Consciousness science, School of Psychological Sciences and Sagol School of Neuroscience, Tel-Aviv University), Megan A. K. Peters (Cognitive science, University of California, Irvine and CIFAR Program in Brain, Mind and Consciousness), Eric Schwitzgebel, (Philosophy, University of California, Riverside), Jonathan Simon (Philosophy, Université de Montréal), Rufin VanRullen (Cognitive science, Centre de Recherche Cerveau et Cognition, CNRS, Université de Toulouse).
[Top image by J. Weinberg, created with the help Dall-E 2 and with apologies to Magritte]