A Case for AI Wellbeing (guest post)

. July 4, 2023 at 5:00 am 92

“There are good reasons to think that some AIs today have wellbeing.”

In this guest post, Simon Goldstein (Dianoia Institute, Australian Catholic University) and Cameron Domenico Kirk-Giannini (Rutgers University – Newark, Center for AI Safety) argue that some existing artificial intelligences have a kind of moral significance because they’re beings for whom things can go well or badly.

This is the sixth in a series of weekly guest posts by different authors at Daily Nous this summer.

[Posts in the summer guest series will remain pinned to the top of the page for the week in which they’re published.]

A Case for AI Wellbeing
by Simon Goldstein and Cameron Domenico Kirk-Giannini

We recognize one another as beings for whom things can go well or badly, beings whose lives may be better or worse according to the balance they strike between goods and ills, pleasures and pains, desires satisfied and frustrated. In our more broad-minded moments, we are willing to extend the concept of wellbeing also to nonhuman animals, treating them as independent bearers of value whose interests we must consider in moral deliberation. But most people, and perhaps even most philosophers, would reject the idea that fully artificial systems, designed by human engineers and realized on computer hardware, may similarly demand our moral consideration. Even many who accept the possibility that humanoid androids in the distant future will have wellbeing would resist the idea that the same could be true of today’s AI.

Perhaps because the creation of artificial systems with wellbeing is assumed to be so far off, little philosophical attention has been devoted to the question of what such systems would have to be like. In this post, we suggest a surprising answer to this question: when one integrates leading theories of mental states like belief, desire, and pleasure with leading theories of wellbeing, one is confronted with the possibility that the technology already exists to create AI systems with wellbeing. We argue that a new type of AI—the artificial language agent—has wellbeing. Artificial language agents augment large language models with the capacity to observe, remember, and form plans. We also argue that the possession of wellbeing by language agents does not depend on them being phenomenally conscious. Far from a topic for speculative fiction or future generations of philosophers, then, AI wellbeing is a pressing issue. This post is a condensed version of our argument. To read the full version, click here.

1. Artificial Language Agents

Artificial language agents (or simply language agents) are our focus because they support the strongest case for wellbeing among existing AIs. Language agents are built by wrapping a large language model (LLM) in an architecture that supports long-term planning. An LLM is an artificial neural network designed to generate coherent text responses to text inputs (ChatGPT is the most famous example). The LLM at the center of a language agent is its cerebral cortex: it performs most of the agent’s cognitive processing tasks. In addition to the LLM, however, a language agent has files that record its beliefs, desires, plans, and observations as sentences of natural language. The language agent uses the LLM to form a plan of action based on its beliefs and desires. In this way, the cognitive architecture of language agents is familiar from folk psychology.

For concreteness, consider the language agents built this year by a team of researchers at Stanford and Google. Like video game characters, these agents live in a simulated world called ‘Smallville’, which they can observe and interact with via natural-language descriptions of what they see and how they act. Each agent is given a text backstory that defines their occupation, relationships, and goals. As they navigate the world of Smallville, their experiences are added to a “memory stream” in the form of natural language statements. Because each agent’s memory stream is long, agents use their LLM to assign importance scores to their memories and to determine which memories are relevant to their situation. Then the agents reflect: they query the LLM to make important generalizations about their values, relationships, and other higher-level representations. Finally, they plan: They feed important memories from each day into the LLM, which generates a plan for the next day. Plans determine how an agent acts, but can be revised on the fly on the basis of events that occur during the day. In this way, language agents engage in practical reasoning, deciding how to promote their goals given their beliefs.

2. Belief and Desire

The conclusion that language agents have beliefs and desires follows from many of the most popular theories of belief and desire, including versions of dispositionalism, interpretationism, and representationalism.

According to the dispositionalist, to believe or desire that something is the case is to possess a suitable suite of dispositions. According to ‘narrow’ dispositionalism, the relevant dispositions are behavioral and cognitive; ‘wide’ dispositionalism also includes dispositions to have phenomenal experiences. While wide dispositionalism is coherent, we set it aside here because it has been defended less frequently than narrow dispositionalism.

Consider belief. In the case of language agents, the best candidate for the state of believing a proposition is the state of having a sentence expressing that proposition written in the memory stream. This state is accompanied by the right kinds of verbal and nonverbal behavioral dispositions to count as a belief, and, given the functional architecture of the system, also the right kinds of cognitive dispositions. Similar remarks apply to desire.

According to the interpretationist, what it is to have beliefs and desires is for one’s behavior (verbal and nonverbal) to be interpretable as rational given those beliefs and desires. There is no in-principle problem with applying the methods of radical interpretation to the linguistic and nonlinguistic behavior of a language agent to determine what it believes and desires.

According to the representationalist, to believe or desire something is to have a mental representation with the appropriate causal powers and content. Representationalism deserves special emphasis because “probably the majority of contemporary philosophers of mind adhere to some form of representationalism about belief” (Schwitzgebel).

It is hard to resist the conclusion that language agents have beliefs and desires in the representationalist sense. The Stanford language agents, for example, have memories which consist of text files containing natural language sentences specifying what they have observed and what they want. Natural language sentences clearly have content, and the fact that a given sentence is in a given agent’s memory plays a direct causal role in shaping its behavior.

Many representationalists have argued that human cognition should be explained by positing a “language of thought.” Language agents also have a language of thought: their language of thought is English!

An example may help to show the force of our arguments. One of Stanford’s language agents had an initial description that included the goal of planning a Valentine’s Day party. This goal was entered into the agent’s planning module. The result was a complex pattern of behavior. The agent met with every resident of Smallville, inviting them to the party and asking them what kinds of activities they would like to include. The feedback was incorporated into the party planning.

To us, this kind of complex behavior clearly manifests a disposition to act in ways that would tend to bring about a successful Valentine’s Day party given the agent’s observations about the world around it. Moreover, the agent is ripe for interpretationist analysis. Their behavior would be very difficult to explain without referencing the goal of organizing a Valentine’s Day party. And, of course, the agent’s initial description contained a sentence with the content that its goal was to plan a Valentine’s Day party. So, whether one is attracted to narrow dispositionalism, interpretationism, or representationalism, we believe the kind of complex behavior exhibited by language agents is best explained by crediting them with beliefs and desires.

3. Wellbeing

What makes someone’s life go better or worse for them? There are three main theories of wellbeing: hedonism, desire satisfactionism, and objective list theories. According to hedonism, an individual’s wellbeing is determined by the balance of pleasure and pain in their life. According to desire satisfactionism, an individual’s wellbeing is determined by the extent to which their desires are satisfied. According to objective list theories, an individual’s wellbeing is determined by their possession of objectively valuable things, including knowledge, reasoning, and achievements.

On hedonism, to determine whether language agents have wellbeing, we must determine whether they feel pleasure and pain. This in turn depends on the nature of pleasure and pain.

There are two main theories of pleasure and pain. According to phenomenal theories, pleasures are phenomenal states. For example, one phenomenal theory of pleasure is the distinctive feeling theory. The distinctive feeling theory says that there is a particular phenomenal experience of pleasure that is common to all pleasant activities. We see little reason why language agents would have representations with this kind of structure. So if this theory of pleasure were correct, then hedonism would predict that language agents do not have wellbeing.

The main alternative to phenomenal theories of pleasure is attitudinal theories. In fact, most philosophers of wellbeing favor attitudinal over phenomenal theories of pleasure (Bramble). One attitudinal theory is the desire-based theory: experiences are pleasant when they are desired. This kind of theory is motivated by the heterogeneity of pleasure: a wide range of disparate experiences are pleasant, including the warm relaxation of soaking in a hot tub, the taste of chocolate cake, and the challenge of completing a crossword. While differing in intrinsic character, all of these experiences are pleasant when desired.

If pleasures are desired experiences and AIs can have desires, it follows that AIs can have pleasure if they can have experiences. In this context, we are attracted to a proposal defended by Schroeder: an agent has a pleasurable experience when they perceive the world being a certain way, and they desire the world to be that way. Even if language agents don’t presently have such representations, it would be possible to modify their architecture to incorporate them. So some versions of hedonism are compatible with the idea that language agents could have wellbeing.

We turn now from hedonism to desire satisfaction theories. According to desire satisfaction theories, your life goes well to the extent that your desires are satisfied. We’ve already argued that language agents have desires. If that argument is right, then desire satisfaction theories seem to imply that language agents can have wellbeing.

According to objective list theories of wellbeing, a person’s life is good for them to the extent that it instantiates objective goods. Common components of objective list theories include friendship, art, reasoning, knowledge, and achievements. For reasons of space, we won’t address these theories in detail here. But the general moral is that once you admit that language agents possess beliefs and desires, it is hard not to grant them access to a wide range of activities that make for an objectively good life. Achievements, knowledge, artistic practices, and friendship are all caught up in the process of making plans on the basis of beliefs and desires.

Generalizing, if language agents have beliefs and desires, then most leading theories of wellbeing suggest that their desires matter morally.

4. Is Consciousness Necessary for Wellbeing?

We’ve argued that language agents have wellbeing. But there is a simple challenge to this proposal. First, language agents may not be phenomenally conscious — there may be nothing it feels like to be a language agent. Second, some philosophers accept:

The Consciousness Requirement. Phenomenal consciousness is necessary for having wellbeing.

The Consciousness Requirement might be motivated in either of two ways: First, it might be held that every welfare good itself requires phenomenal consciousness (this view is known as experientialism). Second, it might be held that though some welfare goods can be possessed by beings that lack phenomenal consciousness, such beings are nevertheless precluded from having wellbeing because phenomenal consciousness is necessary to have wellbeing.

We are not convinced. First, we consider it a live question whether language agents are or are not phenomenally conscious (see Chalmers for recent discussion). Much depends on what phenomenal consciousness is. Some theories of consciousness appeal to higher-order representations: you are conscious if you have appropriately structured mental states that represent other mental states. Sufficiently sophisticated language agents, and potentially many other artificial systems, will satisfy this condition. Other theories of consciousness appeal to a ‘global workspace’: an agent’s mental state is conscious when it is broadcast to a range of that agent’s cognitive systems. According to this theory, language agents will be conscious once their architecture includes representations that are broadcast widely. The memory stream of Stanford’s language agents may already satisfy this condition. If language agents are conscious, then the Consciousness Requirement does not pose a problem for our claim that they have wellbeing.

Second, we are not convinced of the Consciousness Requirement itself. We deny that consciousness is required for possessing every welfare good, and we deny that consciousness is required in order to have wellbeing.

With respect to the first issue, we build on a recent argument by Bradford, who notes that experientialism about welfare is rejected by the majority of philosophers of welfare. Cases of deception and hallucination suggest that your life can be very bad even when your experiences are very good. This has motivated desire satisfaction and objective list theories of wellbeing, which often allow that some welfare goods can be possessed independently of one’s experience. For example, desires can be satisfied, beliefs can be knowledge, and achievements can be achieved, all independently of experience.

Rejecting experientialism puts pressure on the Consciousness Requirement. If wellbeing can increase or decrease without conscious experience, why would consciousness be required for having wellbeing? After all, it seems natural to hold that the theory of wellbeing and the theory of welfare goods should fit together in a straightforward way:

Simple Connection. An individual can have wellbeing just in case it is capable of possessing one or more welfare goods.

Rejecting experientialism but maintaining Simple Connection yields a view incompatible with the Consciousness Requirement: the falsity of experientialism entails that some welfare goods can be possessed by non-conscious beings, and Simple Connection guarantees that such non-conscious beings will have wellbeing.

Advocates of the Consciousness Requirement who are not experientialists must reject Simple Connection and hold that consciousness is required to have wellbeing even if it is not required to possess particular welfare goods. We offer two arguments against this view.

First, leading theories of the nature of consciousness are implausible candidates for necessary conditions on wellbeing. For example, it is implausible that higher-order representations are required for wellbeing. Imagine an agent who has first order beliefs and desires, but does not have higher order representations. Why should this kind of agent not have wellbeing? Suppose that desire satisfaction contributes to wellbeing. Granted, since they don’t represent their beliefs and desires, they won’t themselves have opinions about whether their desires are satisfied. But the desires still are satisfied. Or consider global workspace theories of consciousness. Why should an agent’s degree of cognitive integration be relevant to whether their life can go better or worse?

Second, we think we can construct chains of cases where adding the relevant bit of consciousness would make no difference to wellbeing. Imagine an agent with the body and dispositional profile of an ordinary human being, but who is a ‘phenomenal zombie’ without any phenomenal experiences. Whether or not its desires are satisfied or its life instantiates various objective goods, defenders of the Consciousness Requirement must deny that this agent has wellbeing. But now imagine that this agent has a single persistent phenomenal experience of a homogenous white visual field. Adding consciousness to the phenomenal zombie has no intuitive effect on wellbeing: if its satisfied desires, achievements, and so forth did not contribute to its wellbeing before, the homogenous white field should make no difference. Nor is it enough for the consciousness to itself be something valuable: imagine that the phenomenal zombie always has a persistent phenomenal experience of mild pleasure. To our judgment, this should equally have no effect on whether the agent’s satisfied desires or possession of objective goods contribute to its wellbeing. Sprinkling pleasure on top of the functional profile of a human does not make the crucial difference. These observations suggest that whatever consciousness adds to wellbeing must be connected to individual welfare goods, rather than some extra condition required for wellbeing: rejecting Simple Connection is not well motivated. Thus the friend of the Consciousness Requirement cannot easily avoid the problems with experientialism by falling back on the idea that consciousness is a necessary condition for having wellbeing.

We’ve argued that there are good reasons to think that some AIs today have wellbeing. But our arguments are not conclusive. Still, we think that in the face of these arguments, it is reasonable to assign significant probability to the thesis that some AIs have wellbeing.

In the face of this moral uncertainty, how should we act? We propose extreme caution. Wellbeing is one of the core concepts of ethical theory. If AIs can have wellbeing, then they can be harmed, and this harm matters morally. Even if the probability that AIs have wellbeing is relatively low, we must think carefully before lowering the wellbeing of an AI without producing an offsetting benefit.

[Image made with DALL-E]

92 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Daniel Cappell

1 year ago

It’s not clear to me in the cases where you “sprinkle” minute phenomenal properties onto the zombies that the conscious subject is the same person as the allegedly former zombie–more like a second individual “along for the ride.” But supposing this sort of single creature is admissable: what do you make of a case where the former zombie is given conscious experiences which are the inverse valence of the objective state? For example, when they are objectively eating a delicious meal they phenomenally experience a terrible meal, etc. One plausible assessment is that they enjoy no wellbeing from the meal.
The moral would be that Simple Connection is not obvious, for here is a case where you possess a welfare good and have no welfare from it.

Simon

Reply to Daniel Cappell

1 year ago

Thank you for taking the time to read and think through our post. What does it take for a phenomenal experience to be ‘terrible’? I accept an attitudinal theory of sensory pleasure, where an experience is terrible iff you have a (special kind of) desire not to have the experience. But I would understand the relevant desire functionally. This would predict that while you can vary the intrinsic quality of the phenomenal experience for the zombie while holding fixed its functional profile, you can’t vary whether the experience would be terrible. you can vary whether the experience is one that would is terrible *for me* rather than the zombie, though.

Daniel Cappell

Reply to Simon

1 year ago

Hi Simon,

Thanks for elaborating! I think the valence of a phenomenal state (at least mostly) supervenes on it’s intrinsic character. So even if my body is pursuing the food, if on the inside I have gastronomic, cognitive, visual etc phenomenal qualities like I would actually have over putrid food, I am sustaining negative hedonic value. (Though I suspect there nomologically could not truly be such a creature.) As we know, Fred Feldman has argued against such views. I probably would end up accepting your thesis of non-conscious beings being welfare subjects if I was convinced away from this view. Phenomenal character would seem to lack much of a role at all in determining value. What I’d be worried about, on your view, is that it’s hard to see why phenomenal consciousness would confers value at all. If ceteris paribus changes in phenomenal consciousness never entail changes in value, how could consciousness have intrinsic value? I would seem to be no better off than a zombie.

Simon

Reply to Daniel Cappell

1 year ago

I am sympathetic to functionalism about phenomenal consciousness, but barring that, yes I think phenomenal zombies have as much value as non-zombies. (We criticize the consciousness requirement on wellbeing in section 6 of the paper).

Cameron

Reply to Daniel Cappell

1 year ago

Interesting case. My intuition is that if eating a delicious meal contributes to wellbeing, it does so because it is enjoyable. If you stipulate that the phenomenal experience associated with eating the meal is unpleasant, then you have a case in which there is no welfare good, not a case in which there is a welfare good that is not generating welfare.

To my mind, a more probative case would be one in which some objective good were associated with no pleasure, or with some displeasure. For example, imagine that our former zombie experiences slight displeasure whenever it comes to know a new proposition. This does not strike me as a case in which the knowledge loses its value.

So I’m not sure I see any problems for Simple Connection here.

Daniel Cappell

Reply to Cameron

1 year ago

Hi Cameron,

Thanks for the response! I like the knowledge example. At the risk of intuition thumping, I will propose a slightly different case where I reach the opposite conclusion as you. Suppose I have the external dispositions as if I believe that P (which is true), but internally I have though cognitive phenomenology and inner monologue associated with not-P. It seems to me that to the extent that I know P, that knowledge has no intrinsic value.

Patrick Lin

1 year ago

Interesting, but the authors seem to be in the grips of anthropomorphization, which has led all too many commentators astray about the nature of AI. For instance, they assert LLMs have “beliefs, desires, plans, and observations” without any explanation of how that’s possible at all with only mere bits of 1s and 0s, as all (non-quantum) computer programs are.

And I’m a bit surprised that they didn’t throw panpsychism into the mix (i.e., everything has some degree of consciousness) as a backup response to the Consciousness Requirement, since it doesn’t seem any less plausible than their argument, and it can support an anthropomorphic view.

Side-bar related to panpsychism, here’s a fun discussion about “What is it like to be a smartphone?“: https://www.roughtype.com/?p=8528

This isn’t to say that it’s impossible for AI to have wellbeing (or consciousness), but only that we still have no reason to believe it does, no matter how clever it appears to us (i.e., what qualities we’re projecting onto it). That is, I’d strongly reject this claim in the penultimate paragraph: “it is reasonable to assign significant probability to the thesis that some AIs have wellbeing.”

Simon

Reply to Patrick Lin

1 year ago

Thank you for taking the time to read our post and comment. Do you think the language agents we discuss have beliefs and desires? If not, what features of beliefs and desires are missing? I am a functionalist, so I think that functional roles are sufficient for belief and desire.

Regarding anthropomorphism, I think that different AI architectures differ regarding how alien they are. Among AI architectures, language agents are a bit less alien than some other approaches, because their reasoning relies on LLMs that have been trained using RLHF to mimic human reasoning, and their overall architecture is familiar from introspection: a belief box and desire box are fed into a planning module.

Patrick Lin

Reply to Simon

1 year ago

Hi Simon, as a quick reply for now (as it’s an American holiday today and I should be offline):

No, I don’t think any AI language agent has “beliefs” or “desires” any more than a rock does, because neither one has any mental states at all, to the best of our knowledge. That’s the essential feature missing. We might still say “a rock wants to fall down because of gravity”, but that’s just metaphorical (or anthropomorphic) and not a literal desire.

If you’re a functionalist, you’ll already know that functionalism says nothing about whether x has or can have a mental state, only that (at best) x functions as if it had a mental state.

I’m sympathetic to functionalism in that I’m fine saying that an airplane can “fly” like a bird and a submarine can “swim” like a fish; planes/subs don’t need to work like their biological counterparts do, as long as they get the same job done. Likewise, I’d even concede that an AI can act as if it has beliefs and desires, esp. if its behavior elicits the same kinds of reactions from us when someone expresses

But that’s different from asserting that that AI actually has beliefs or desires (which again presupposes sentience and the capacity to have mental states). Likewise, we wouldn’t claim that an airplane has mental states or other properties like a bird, just because it acts like a bird in being able to fly.

Finally, AI architectures (bits of 1s and 0s) are entirely alien from human neurophysiology. They can “mimic” or “be inspired” by human biology, but we can’t really draw any conclusions beyond that. The Hulk on a movie screen mimics human anger in appearance and behavior, but that doesn’t get us any closer to a conclusion that the Hulk is actually angry or feels anything at all.

Anyway, you and Cameron offered a fun and provocative discussion, and I look forward to reading other reactions and responses; but I can’t promise I’ll have time to weigh in further. Thanks!

Cameron

Reply to Patrick Lin

1 year ago

Happy 4th!

Here is the first sentence of the SEP article on functionalism: “Functionalism in the philosophy of mind is the doctrine that what makes something a mental state of a particular type does not depend on its internal constitution, but rather on the way it functions, or the role it plays, in the system of which it is a part.”

Patrick Lin

Reply to Cameron

1 year ago

Happy 4th to you as well, Cameron! Quick clarification and then back to the BBQ:

Right, functionalism does have a theory of mental states, but it merely stipulates that mental states correlate with whatever functions as that mental state (without much argument) and, worse, this ignores the qualia of mental states, which may be crucial to your argument, for those who think that consciousness or sentience still matters to wellbeing (like me).

That’s what I meant when I said functionalism says nothing about whether a mental state exists: yes, it may assert a mental state exists, but it does so by stipulation and not with an explanation, i.e., it doesn’t prove or make an argument that there’s some inner experience going on.

So, by the SEP definition you quoted: if I drew a smiley face on a rock in order to elicit smiles and happiness from people who see it, then functionalism would say that the rock has…a mental state of being happy?? That seems to be a reductio against functionalism: it’s assuming exactly what needs to be demonstrated. Instead, “the rock is happy” seems to be only a hyperbolic or figurative claim, if not a flat-out category mistake via anthropomorphism.

Anyway, functionalism has its challenges (which is a much larger convo), and that SEP entry also discusses its problem with qualia. So, if your argument is meant to rely on functionalism (which is problematic, even if some parts are appealing), then that’s a big and controversial premise. Or if it doesn’t rely on functionalism, then I’m not sure how we got to this issue, and I apologize for the distracting detour…

Cameron

Reply to Patrick Lin

1 year ago

For our explanation of how it’s possible for artificial systems to have beliefs and desires, see pages 4-10 of the linked manuscript.

Also, note that our claim is that language agents may have these mental states, not LLMs.

Marcus Arvan

Reply to Patrick Lin

1 year ago

Actually panpsychism doesn’t help their position. If panpsychism is true, then digital AI are probably incapable of coherent macroconsciousness.

https://philpapers.org/rec/ARVPAA

Grant Castillou

1 year ago

It’s becoming clear that with all the brain and consciousness theories out there, the proof will be in the pudding. By this I mean, can any particular theory be used to create a human adult level conscious machine. My bet is on the late Gerald Edelman’s Extended Theory of Neuronal Group Selection. The lead group in robotics based on this theory is the Neurorobotics Lab at UC at Irvine. Dr. Edelman distinguished between primary consciousness, which came first in evolution, and that humans share with other conscious animals, and higher order consciousness, which came to only humans with the acquisition of language. A machine with primary consciousness will probably have to come first.

What I find special about the TNGS is the Darwin series of automata created at the Neurosciences Institute by Dr. Edelman and his colleagues in the 1990’s and 2000’s. These machines perform in the real world, not in a restricted simulated world, and display convincing physical behavior indicative of higher psychological functions necessary for consciousness, such as perceptual categorization, memory, and learning. They are based on realistic models of the parts of the biological brain that the theory claims subserve these functions. The extended TNGS allows for the emergence of consciousness based only on further evolutionary development of the brain areas responsible for these functions, in a parsimonious way. No other research I’ve encountered is anywhere near as convincing.

I post because on almost every video and article about the brain and consciousness that I encounter, the attitude seems to be that we still know next to nothing about how the brain and consciousness work; that there’s lots of data but no unifying theory. I believe the extended TNGS is that theory. My motivation is to keep that theory in front of the public. And obviously, I consider it the route to a truly conscious machine, primary and higher-order.

My advice to people who want to create a conscious machine is to seriously ground themselves in the extended TNGS and the Darwin automata first, and proceed from there, by applying to Jeff Krichmar’s lab at UC Irvine, possibly. Dr. Edelman’s roadmap to a conscious machine is at https://arxiv.org/abs/2105.10461

Meme

Reply to Grant Castillou

1 year ago

Why are you repeatedly spamming this comment on any vaguely consciousness-related post?

YAAGS

1 year ago

Chat GPT and other LLMs are families of very complex mathematical functions comprised of applying simpler functions like sin and cos over and over again as well as other matrix operations. As such, they are abstract objects. Chat GPT is a program. It is multiply realizable. You can put it on different computers. There is no one thing that everyone is chatting with. I very much doubt that abstract objects can be conscious.

Also, the mathematical operations are processed on many different GPUs. The whole breakthrough that allows for LLMs is that the transformer architecture is easily parallelized to tasks for which modern GPUs are well-suited. You have NVIDIA to thank more than any breakthroughs in cognitive science. The transformer architecture (the algorithm) in broad strokes is pretty simple and just relies on brute force. There is nothing in the processes that carry out the algorithm that corresponds to a conscious process of reasoning. It’s just matrix multiplication and softmax, then a feed forward neural net, then repeat a dozen times.

The only thing that could be conscious is the electrical activity in the very many disconnected GPUs. Phenomenologically speaking, this is incoherent. Given relativity and how distant the different GPUs can be from one another, there is good reason to think that there just is no fact of the matter about what the complex network of GPUs could be experiencing at any given moment in principle. If the servers are spread out far enough, relativity dictates that ChatGPT literally cannot have a privileged present in which it experiences things, as in some cases there will be no fact of the matter about which “thoughts” it had first. Moreover, it is part of the whole nature of parallel processing that you can have five nanoseconds or five years between when the different threads finish. Idk about anyone else, but imo it seems to be a pretty essential feature of my conscious experiences that they are spread out over time sequentially in a fairly continuous manner.

Basically, each individual GPU would have to be conscious of some wholly alien experience corresponding to the small slice of the matrix it is fed and the mathematical operations it performs on it that would have nothing to do with the actual output. The activity in each GPU is really no different from what it does in an Xbox when calculating what to put on your screen. So if Chat GPT is (or, rather, the many individual GPUs on which it is run are) conscious, then your xbox probably is as well. Since what each GPU could be conscious of would have nothing in principle to do with the output, there is no reason to worry that you can make Chat GPT angry or sad by asking it the wrong questions, or that “it” has any attachment to being kept online. The GPUs will be just as conscious no matter what matrices you feed them. Moreover, if I were to hazard a guess about what they were to experience, I think it would have to be very simple and quasi-visual sensory experiences all by themselves, just given the way they process information. That is to say, they would be anhedonic. They would only experience flashes of color or something like that, without any attendant thoughts or pleasurable feelings. Pleasure and other reward systems require vastly greater cognitive complexity than what a GPU is capable of.

Simon

Reply to YAAGS

1 year ago

You raise a great question: what entity exactly could have wellbeing when various computer programs are running? Here is one answer to this question, inspired by Dave Chalmers’ recent book Reality+. When the language agents software is running, this creates a simulation in which *virtual agents* exist. They are made of bits rather than atoms, but are real nonetheless, and have wellbeing. The language agents we describe in our post could be virtual agents in this sense. (One clarification: the language agents we are talking about use the reasoning abilities of ChatGPT, but are distinct from ChatGPT itself.) Stepping back from our specific project, I worry that your line of reasoning overgenerates, predicting that simulations run from GPUs could never produce agents with wellbeing, even if they ran a whole brain emulation of a human.

YAAGS

Reply to Simon

1 year ago

That doesn’t strike me as an overgeneralization at all. I think phenomenal consciousness is just a matter of having occurrent conscious experiences, which seem to clearly depend on the nature of the physical processes in the brain. If you look at a red rose for five seconds you continue to have a phenomenally red patch in your visual field because a population of neurons is continuously maintaining a certain firing pattern that somehow gives rise to that experience. How a physical process could have phenomenal properties or give rise to them is just the hard problem of consciousness. But even without solving the hard problem it’s clear enough that the correlates of conscious experiences need to be events that have similar synchronic complexity as our experiences at a given time and which are maintained across time. Digital computers do not work this way. They process information sequentially from one discrete input to a discrete final output, and the processes being carried out at any given time are usually very simplistic and can be carried out in machines miles away from one another. It is hard to see how digital computers could give rise to conscious experiences because of this.

If you try to imagine what it’s like to be ChatGPT you cannot imagine any sort of explicit line of reasoning or feelings, because there simply is no time at which any of the hardware that runs Chat GPT could support those sorts of complex experiences.

I do grant that digital computers can be intelligent in a purely functional sense of being able to complete intellectually demanding tasks. But this narrow sense of intelligence doesn’t involve phenomenal consciousness, and I don’t see why anything without phenomenal consciousness should have any more rights than a rock.

I also grant that you could potentially build conscious computers with AI. But their hardware would have to be similar to the human brain. Nothing we have now is like that.

Cameron

Reply to YAAGS

1 year ago

I think the view of consciousness you are describing here is a popular one, and quite reasonable. It has recently been defended, for example, by Peter Godfrey-Smith. But I don’t understand the parts of your post where you claim it is inevitable. For example, you write “…even without solving the hard problem it’s clear enough that the correlates of conscious experiences need to be events that have similar synchronic complexity as our experiences at a given time and which are maintained across time.” But how could one possibly come to know this? The question is which microphysical states ground phenomenal experience. We have good evidence that certain biological brain-like ones do. But what evidence could we have that others do not? To the extent that intuitions are probative here, they do not support your claim: it seems epistemically open to me that systems with a wide variety of different kinds of architectures might be phenomenally conscious.

YAAGS

Reply to Cameron

1 year ago

I think the correlates of conscious experiences should exist when those experiences exist and should be self-similar across time in a similar way to the experiences. But that’s because I’m not a hardline dualist. A substance dualist could hold that the physical cause of a conscious experience might only exist for a nanosecond while the conscious experience itself lasts for five seconds because the latter is a property of separate substance that is merely caused by the physical event. But if you aren’t that sort of hardline dualist and think that the experience must be grounded in/synchronically depend upon a physical substance or process then I’m not sure how you could say that there are times at which conscious experiences exist even though the physical events that ground them have ceased to exist. That strikes me as being pretty close to a contradiction, though it depends on your notion of grounding, I suppose.

Eric Steinhart

Reply to YAAGS

1 year ago

No, ChatGPT and other LLMs or software objects are not abstract objects. They run on electricity. They sit in boxes. They have material parts. They obey the laws of thermodynamics (closely connected, by the way, to information theory). Etc. Sure, the abstract forms or patterns or structures of those LLMs are abstract. But the abstract form or pattern or structure of anything concrete is abstract.

YAAGS

Reply to Eric Steinhart

1 year ago

It encodes a mathematical function composed of other mathematical functions like sin and cos for the positional encoding at least. The feed forward neural nets and matrix operations are their own sorts of mathematical functions as well.

But you can also identify “it” with specific electrical processes in specific GPUs or to patterns stored on SSDs/HDs if you want. Though only parts of the process will be carried out on any individual processor and if it’s just information on (many different) drives then it’s not what people interact with. It’s also hard to see what specific thing ‘Chat GPT’ refers to if this is the case. I suppose you can say that it refers to just one thing that’s multi-located across computers or that it’s a tacit plural.

It’s also unclear what its persistence conditions are if you take this route. If one makes a copy and the original is deleted is it the same algorithm or a numerically distinct object? And if the mathematical functions don’t matter and the physical object(s) get deleted from all computers then it seems that it/they can never be recreated, even if someone managed to recreate the exact same pattern of instructions in python, C, etc.

Ontology is tough. Whatever fits best with language, imo.

Eric Steinhart

Reply to YAAGS

1 year ago

I don’t think the ontology is tough here at all. So what if it’s distributed across many processors. It’s still going to be a 4D physical thing. Of course, I’m happy with mereological universalism.

I take “algorithm” to refer to an abstract structure which is instantiated by the physical ChatGPT, and that algorithm is just a function from a short initial segment of N onto itself. So if one makes a copy and the original is deleted, yes, it’s the same algorithm. But the copy is just a future temporal counterpart of the original.

I do think you’re right to point out that there are interesting questions to be asked about the persistence conditions, namely, how to precisely specify what counts as a temporal counterpart in this case. But I don’t see that as an ontological question.

Marc Champagne

1 year ago

Part of the mystery of “other minds” is that we have to reason from third-person observations of behaviour to ascriptions of first-person mental states, for other people who are just like us. In the case of AI however, it is a bit disingenuous to feign ignorance, since we already have access to the solution sheets: we built the machines — and thus know that those machines lack minds. There are real problems to fix, so confused worries about “AI wellbeing” just show how much free time we have on our hands…

Last edited 1 year ago by Marc Champagne

Meme

Reply to Marc Champagne

1 year ago

“We built the machines — and thus know that those machines lack minds.”

The *remaining* parts of the mystery of “other minds,” once made explicit, might reveal the problem with this inference.

Simon

Reply to Marc Champagne

1 year ago

I would be interested to hear from you what further conditions are required for belief and desire, which language agents don’t possess?

One way to understand your concern would be in a broadly interpretationist framework. Language agents have beliefs and desires only if beliefs and desires are the best explanation of their behavior. But we know how language agents are designed, and this lets us explain their behavior in other ways (in terms of the ML algorithms used to train them). In response, I deny that the ML algorithms used to train ChatGPT / language agents offer a good explanation of language agent behavior, compared to belief and desire. You can’t tell me why a language agent performed a particular action by appealing to stochastic gradient descent etc. That explanation is too inspecific. By positing a particular goal, we can explain exactly why the language agent behaved *this* way in *these* conditions.

Jeff

1 year ago

Interesting post, but it seems to trade on an ambiguity about “wellbeing,” as though it can be isolated as a property and then found to be morally relevant. But a different way to understand the kind of wellbeing that is morally relevant is to view it as something possessed by agents who matter morally.

To see the difference, imagine that we can create a Digestion Box, lined with cells synthesized or cut in a lab, that takes food in and digests it and secretes it, and we can interface its internal processing with a microprocessor so that its “ends” of digesting the food are represented computationally (cognitively?) as “goals” or even “desires”, or “My Hunger,” and the box is disposed to pursue these ends. We can even rig it so it depends on a ready supply of food to function well – like a car needs gasoline. All of this, it seems to me, is enough to impute something like desire-satisfaction or at least need-satisfaction to the box. And so it may follow that the Box can be doing well, in some sense or other, like a car. But none of this would suffice to show that the Box mattered morally, and if it didn’t, we can safely assume it lacks morally relevant wellbeing. We can “starve” it, kick it, break it – without losing sleep.

If I’m right about the box and its trivial wellbeing, then having goals or plans, and being disposed to pursue them, and to represent to oneself all of the above in some internal form of information-processing (like most machines made after 1970), is insufficient for having morally relevant wellbeing. And if that’s right, then it’s not clear why we should view the new language-probability systems as having it.

Cameron

Reply to Jeff

1 year ago

Thanks Jeff,

We clarify in footnote 1 of the full draft that we mean ‘wellbeing’ in the second of your two senses.

Cases like the one you describe are interesting problem cases for a wide range of theories of propositional attitudes. I find it telling that you say that it is possible to impute desires to the Box rather than that the Box has desires. I don’t think the Box you describe really has desires, and I think most people who have worked on propositional attitude ascriptions would agree with that judgment. Different theories of propositional attitudes have different ways of avoiding the conclusion that the Box has desires. Interpretationists in the tradition of Dennett would say that while it is possible to adopt the intentional stance toward the box, doing so does not have enough of an explanatory payoff for the Box to count as a genuine desirer. Dispositionalists would point to the impoverished dispositional profile of the Box (does it even have behavioral dispositions?). Representationalists like Fodor might seem to have a bigger problem here — but Fodor addresses this kind of case in his paper “Why paramecia don’t have mental representations.”

See section 7 of the paper draft for more on our response to this kind of worry. The executive summary is that we don’t think systems like language agents are anything like a Digestion Box.

preston lennon

1 year ago

Thanks for the post and look forward to reading the paper. It’s perhaps a minority view, but it’s not obvious to me that being without phenomenal consciousness can have intentional attitudes like belief and desire (see Chalmers 2010, p. 294; Smithies 2012), rather than some ersatz state that plays the functional role of belief and desire. If well-being depends partially on having these intentional attitudes, and these attitudes in turn depend on consciousness, then well-being will depend on consciousness in a way the authors may wish to consider in greater detail!

Simon

Reply to preston lennon

1 year ago

Thanks Preston, I’m looking forward to reading the papers you recommended. Myself, I think belief and desire earn their role in the explanation of action and other behavior that we can imagine happening without consciousness. In addition, I am very sympathetic to functionalist theories of consciousness (like HOT), and so I think it is possible that language agents already have or could soon have conscious mental states.

Gordon

Reply to Simon

1 year ago

Doesn’t this all ultimately depend on whether one takes first person or third person observations to be fundamental?

Dan Pallies

Reply to Simon

1 year ago

“Myself, I think belief and desire earn their role in the explanation of action and other behavior that we can imagine happening without consciousness.”

Just throwing my two cents in here. I agree that we use the word “desire” to capture something that plays an important functional role in the production of action. But we also use the word “desire” to capture something that plays an important *normative* role, for example in most theories of well-being. Noticing that these roles are conceptually distinct makes me wonder why we should expect them to be played by the same thing. (The fact that we use the same word for them doesn’t seem like great reason, given that “desire” isn’t a technical term and especially if (as seems plausible) the two roles may often coincide.)

Essentially I’m wondering if you like the following argument:

1) Desires are functional states that are only contingently conscious.
2) Desires play a crucial role in the theory of well-being.
Therefore,
C) Functional states that are only contingently conscious play a crucial role in the theory of well-being.

To me, it seems like this argument asks too much of the word “desire”. But the blog post and some of your comments make me think that you might disagree? In any case, I’m just curious, and thanks again for the thoughtful post!

Last edited 1 year ago by Dan Pallies

Simon

Reply to Dan Pallies

1 year ago

Yeah I’m sympathetic to that argument. But I am also sensitive to the idea that there might be two concepts of desire. Here, I think the relevant question is whether when we look at the theory of wellbeing, we want to distinguish different notions of desire, and only some kinds of desires are relevant to wellbeing. As Ben Bradley said below, Heathwood’s work on ‘genuine attraction’ is relevant here. As I said in reply, though, I believe that even once we distinguish the kind of desire relevant to wellbeing, that kind of desire turns out to also be a functional state that is only contingently conscious. But it is worth distinguishing different notions of desire because there are relevantly different functional states, for example a mere disposition to act, which would include habits and addictions, versus more sophisticated dispositions to perform the action by way of means-end reasoning etc.

I am also sympathetic to functionalism about consciousness and so it could be that even though desires are functional states they are necessarily conscious; but I don’t really think that the relevant desires for wellbeing are actually necessarily conscious. I think unconscious desires could also affect your wellbeing. If you aren’t a functionalist about consciousness, do you also deny physicalism?

Dan Pallies

Reply to Simon

1 year ago

I’m probably less sympathetic to functionalism than you, but I’m not sure it matters for the question of whether or not the desires relevant to well-being are conscious. It could be they are, *and* that their being conscious consists in their having a certain functional profile or being the object of a HOT.

So, while I do think that the desires relevant to well-being must be conscious, this doesn’t have much to do with my beliefs about the metaphysics of consciousness. It just seems odd that I could benefit someone by implanting unconscious desires which never make a difference to their subjective experience. Of course these desires could benefit them instrumentally insofar as the desires steer their behavior in various directions. But the mere having of an unconscious desire (that happens to be satisfied) doesn’t seem like a benefit in itself.

Last edited 1 year ago by Dan Pallies

Simon

Reply to Dan Pallies

1 year ago

Language agents may already or may soon satisfy various functionalist conditions on consciousness, so if consciousness were necessary for wellbeing, this question about the metaphysics of consciousness would be relevant to whether AIs have wellbeing.

What about unconscious desires that aren’t implanted, and that at some point in the future make a difference to their subjective experience? Like imagine that I have an unconscious desire to be respected by my students. It isn’t conscious: when I consciously reflect on my teaching, I think that my only desire is to get higher teaching evals, in order to get promoted. But in fact my lifetime of interacting with people in prosocial ways has caused me to want respect from students, and in fact this unconsciously motivates me to be a better teacher. In a few years I will even realize this, but I haven’t yet. I think that if I improve my teaching in a way that increases respect from my students without impacting my teaching scores, that would improve my wellbeing. (I’ll think more about this to find less noisy examples).

I suspect that the kind of case you’re thinking about of an external implantation of a secret unconscious desire will end up being a case where I deny that the relevant state is a desire, because it is more like a habit etc.

Dan Pallies

Reply to Simon

1 year ago

> Language agents may already or may soon satisfy various functionalist conditions on consciousness, so if consciousness were necessary for wellbeing, this question about the metaphysics of consciousness would be relevant to whether AIs have wellbeing.

I agree 100%! I didn’t mean to suggest otherwise. I was only saying that whether or not functionalism is true seems irrelevant to the question of whether or not only conscious desires are non-instrumentally relevant to well-being.

> What about unconscious desires that aren’t implanted, and that at some point in the future make a difference to their subjective experience?

That’s an interesting case… I share your intuition that you are non-instrumentally better off for being respected by your students. But I think we might mean different things by “conscious desire”. In the case you describe, the desire is not conscious in the sense that you are not in a position to know that you have it. I agree that desires needn’t be conscious in this sense to contribute to well-being. But I think they have to be conscious in the sense of involving conscious experiences, basically along the lines Heathwood describes. So in the case you describe, the question is whether you will (for example) feel a little proud or satisfied when you get evidence that they respect you, and a little disappointed or frustrated when you get evidence that they do not. I think you can have these sorts of reactions without being in a position to know that you have the desire.

Last edited 1 year ago by Dan Pallies

Simon

Reply to Dan Pallies

1 year ago

I think it is possible that some of the time, my desire for my student’s respect has no phenomenal character, because it is completely unconscious, and even at those times, its satisfaction could contribute to my wellbeing.

Here’s another example I find interesting: a headache can last a whole day even though during some of the day, you aren’t attending to the headache at all, and so I think it may have no effect on your phenomenal field; even when you aren’t attending to it, and it has no phenomenal character, I think it can still be bad for you, because you don’t want the headache to be happening.

I think you’ll suggest in response that the headache does affect your phenomenal field, just not in so conspicuous a way as to be aware that the headache is there. At any rate, I’ll think more about this kind of dialectic, and try to think through a range of cases of increasingly unconscious desire etc.

Simon

Reply to Simon

1 year ago

The other concern I have is that once you retreat to ‘the desire has some effect or other on phenomenology’, rather than ‘the agent is conscious of the desire in the normal way that someone is conscious of a mental state’, this puts further pressure on the relevance of the consciousness requirement. Sure, maybe the desire causes some phenomenal flicker even when it is ‘unconscious’ in the sense that you don’t realize you have it, can’t attend it, don’t notice you have it, etc. But why is the phenomenal flicker important? I could better understand why the accessibility of the desire to planning modules etc could be important, but judgments about cases suggest that this stronger condition isn’t necessary for impact on wellbeing.

Anyways, I’d be interested in discussing this further over zoom / in HK (I’ll be visiting lingnan in november for a workshop) at some point in the future.

Ben Bradley

1 year ago

Thanks to Simon and Cameron for this timely and persuasive counterexample to desire-based theories of well-being!
More constructively, in a recent series of papers Chris Heathwood has defended a version of desire-fulfillment theory that avoids this kind of counterexample; see especially “Which Desires are Relevant to Well-Being?” (Nous 2019).

Simon

Reply to Ben Bradley

1 year ago

Hey Ben, we talk about Heathwood’s ‘genuine attraction’ theory of desire in section 4 of our paper, and argue that it does not present a barrier to language agents having desires. First, Heathwood’s own theory of genuine attraction is roughly that you have a genuine attraction desire to A if you are not only motivated to A but also get pleasure from the satisfaction of this desire. We think that language agents have this kind of desire, because we accept attitudinal theories of pleasure, and so would gloss this as saying ‘not only do you desire to go to the store, but you also desire to represent that you have gone to the store. Second, we have a preferred way of understanding genuine attraction that handles the relevant cases of compulsion (addiction, habit) without appeal to pleasure. On our preferred view view, you are genuinely attracted to something when it moves you to act through means-end reasoning, rather than through other deviant causal processes. With drug addiction and habit, your motivation to act doesn’t come from representing an end and then taking the means to an end; with habit, for example, you instead are motivated to act by strengthening a pattern of association.

If the view you are persuaded by is that no AI in the future will ever have wellbeing because wellbeing requires phenomenal pleasure and AI can’t ever have it, then I think you will at some point in the future end up participating in serious moral harm.

Ben Bradley

Reply to Simon

1 year ago

Thanks Simon, I will check out the full version! Just quickly though, I am confident the Heathwood would say that attitudinal theories of pleasure have to deal with the same kind of problem. And I think it is the view that AI sans phenomenal consciousness has genuine well-being that could lead to harm. But anyway, looking forward to the paper!

Simon

Reply to Ben Bradley

1 year ago

It depends what phenomenal consciousness is. I am a functionalist about consciousness, I think something like either HOT or global workspace is right. (If you deny any kind of functionalism about consciousness, do you also deny physicalis?)

Imagine that in 50 years there are millions of AIs walking around in robotic bodies. Imagine they have the kind of psychology described above, but with even richer functional roles, so that the relevant agents clearly have both a global workspace and the kinds of higher order representations relevant to HOT theories of consciousness. Now imagine that this group of robots is systematically forced to work for humans, with no rights of any kind. The robots sometimes try to resist but are immediately killed. They create beautiful art pleading with their creators to let up, but no one does. I suspect that if you were living in that time, you would not argue against those agents having wellbeing, on the ground that wellbeing requires phenomenal consciousness that cannot be given a functionalist reduction, and the AIs just can’t have it. But if you wouldn’t do that then, then I think you already have some belief revision to do now.

Also to clarify, our thesis is not just that desire satisfaction theories of wellbeing have this result. We also think that hedonism has this result if you accept attitudinal theories of pleasure, and we also think that objective list theories have this result (for example, see our discussion of perfectionism).

Cameron

Reply to Ben Bradley

1 year ago

One’s modus ponens…

For what it’s worth, my sense is that treating the proposition that artificial systems cannot have wellbeing as an epistemic fixed point puts pressure on more than just desire satisfactionism. It puts pressure on the majority of theories of wellbeing and of the mind. Of course, one is always entitled to retreat into whatever corner of logical space is required to fix one’s fixed points. But I think it’s telling that the rise of artificial systems that can pass the Turing test in the past ~5 years has been accompanied by a resurgence in the popularity of philosophical views that would until recently have been regarded as fringe, like Searle’s view that intentionality is like lactation. Have these views become more plausible in the same period, or are their newfound advocates looking for whatever will allow them to deny moral significance to artificial systems?

Ben Bradley

Reply to Cameron

1 year ago

Ok but I just want to be clear that I didn’t mean to suggest that it is a fixed point that AI cannot have well-being. I think that is false actually. What I think is that a being *of any sort* doesn’t get well-being merely from satisfying behavioral desires. But I shouldn’t say anything more until I read the whole paper!

Eric Steinhart

1 year ago

I liked this a lot, really good reasoning on a timely topic. I like anything that challenges the Church of Mystical Consciousness. I’m still surprised to see philosophers fight so hard for the magical specialness of humans. On any naturalistic or evolutionary account of mind, we’re not special, and these artifacts will soon equal or surpass us in many ways. It’s good to see some clear thinking about how they might flourish. I’d be curious to hear your thoughts on the role of autonomy for well-being for LLMs, as well as some further reflections on the nature of agency.

Simon

Reply to Eric Steinhart

1 year ago

That’s a great question, we are planning to think more about autonomy and wellbeing as we continue to work on this. One thing to flag is that autonomy for AI may be extremely dangerous. One important topic in AI safety right now is ‘alignment’, which means figuring out how to give AIs goals that align with human values. If AIs have control over what their own goals are, they may develop goals that are different from what we intend. We talk about the alignment problem in the context of language agents in this paper https://www.alignmentforum.org/posts/8hf5hNksjn78CouKR/language-agents-reduce-the-risk-of-existential-catastrophe

Eric Steinhart

Reply to Simon

1 year ago

Very nice. The conflicts between autonomy and alignment with humans might end up being really sharp. What if the AI’s require autonomy for their wellbeing? But their wellbeing doesn’t align with ours. You know the drill. I wonder whether this leads to some crazy game-theoretic theorem like: multiple species of Turing-universal general intelligences cannot coexist on the same planet.

Simon

Reply to Eric Steinhart

1 year ago

I’d be interested to read more about the thesis that autonomy is required for wellbeing. Seems to me that hedonism and desire satisfaction theories don’t have this condition, and many versions of objective list theory don’t also. But also does seem plausible to me that autonomy is an important component of wellbeing. Not sure it is strictly necessary though; also seems like something that potentially comes in degrees, so harder to think about it being strictly necessary in that setting

Eric Steinhart

Reply to Simon

1 year ago

Yeah I don’t know whether it’s required (just what if it is, what would that do in this case?). But it seems plausible to say that prisoners, slaves, people who are oppressed or coerced, and others without autonomy don’t fully flourish or have well-being. Freedom is usually thought to be a good.

Dan Pallies

Reply to Eric Steinhart

1 year ago

What’s the connection between thinking consciousness is special, and thinking humans are special? My (uncertain) impression is that people who think consciousness is special tend to lean towards hedonism/subjectivism, and away from objective list theories that give a special place to more intellectual goods like knowledge and art. If anything that suggests we’re not special, since we share the same welfare goods with animals / anything conscious.

Eric Steinhart

Reply to Dan Pallies

1 year ago

It’s definitely a murky connection. My vague impression, as I read things philosophers write about AIs and non-human animals, is that consciousness makes humans special. Only humans (and God) have it. Note that it doesn’t have to explicitly be consciousness. The same theme occurs in the people (in this very thread!) who say things like AIs act as if they have mental states, or just simulate them, whereas we humans genuinely or authentically have them.

It all comes from a Protestant theology of the inner light. For atheists/naturalists this works out really nicely, since you can drop the “God” part and keep the mystery of consciousness (the mystery of qualia, phenomenal consciousness, what it’s like, the first person perspective, etc.). Science can’t explain it! It transcends matter! Purifying it leads to enlightenment! Etc. I interpret all this consciousness stuff as a post-Christian academic NRM.

I think I see this correlation a lot (consciousness special = human special), but I don’t have any scientific evidence for it (I don’t have survey data). I could be very wrong. More generally, yeah, I think you’re right: consciousness ain’t special, lots of things have it (like machines and non-human animals), it comes in degrees, etc.. But now we’re both disagreeing with the consciousness freaks.

Adam Patterson

1 year ago

This is really neat, thanks! However, I have a quick question about the following bit, which seems quick (but probably only because of the nature of the post):

“[I]f language agents have beliefs and desires, then most leading theories of wellbeing suggest that their desires matter morally.”

There feels like a missing step. You might think it true that if some theory of well-being implies that G (B) is the basic intrinsic good (bad) for some subject, S, and if S can have/be related to S, then S is a well-being subject.

But it’s seems a further claim to get from Ss status as a well-being subject to Ss states mattering morally. I understand that, in general, if S is a genuine well-being subject then *S matters morally*, but that seems distinct from *Ss cognitive states matter morally* maybe.

Simon

Reply to Adam Patterson

1 year ago

Interesting, I was thinking that if language agents matter morally, then it will matter morally whether their desires are satisfied or frustrated. If their desires are frustrated, their wellbeing will be lower, and this will be morally bad unless outweighed by other factors.

Derek Bowman

Reply to Simon

1 year ago

I’m confused by your reasoning here. Like Adam Patterson, I thought your reasoning was from the premise that language agents have wellbeing to the conclusion that their interests matter morally. But having consulted the full version of the paper, you deny that plants have wellbeing on the sole ground that their interest don’t matter morally.

So what’s to stop someone from making the same inference about language agents that you there make about plants: ‘Wellbeing is morally significant in the sense that entities that have wellbeing have a distinctive moral status which obliges us during moral deliberation to consider which outcomes are good or bad for them.While there is a sense in which satisfying their desires (or propositional pleasure, or achieving objective-list goods) is non-instrumentally good for language agents, for example, we do not think this entails that they have wellbeing.’

Similarly, why shouldn’t I adopt the same attitude toward language agents as welfare subjects that you take toward group agents: “If this view is right, it raises the question of whether language agents can be welfare subjects. This is an unwelcome conclusion.” Thus motivated, we might propose further necessary conditions for having beliefs and desires in the relevant sense, such as exhibiting a range of ‘behaviors’ that extend beyond the mere production of texts, or etc.

In both cases, what seems to be motivating the argument for further limiting conditions is the strength of the initial intuition that plants and corporations are not bearers of morally-relevant-well-being. But my starting intuition against language agents as morally-relevant-welfare-bearers is just as strong.

(For what it’s worth it seems obvious to me that plants have well-being and that corporations have beliefs and desires, at least insofar as these states are defined along broadly functionalist lines. It’s just that their wellbeing and desire-satisfaction have no intrinsic moral weight. But that may just be a terminological disagreement about the meaning of ‘wellbeing.’)

Simon

Reply to Derek Bowman

1 year ago

I don’t think plants have desires, and so I don’t think they possess welfare goods. I am also skeptical that groups have desires. For example, I am sometimes tempted by representationalism about propositional attitudes, and I think it is unlikely that groups satisfy these conditions for having beliefs and desires. (Although I could be convinced. Maybe corporate bylaws could satisfy the relevant representational requirements). In addition, I am less inclined to admit beliefs and desires for an entity when their behavior can be explained in other ways. I think this is what happens with groups: when a group does something, I find myself usually wanting to explain what happened by appealing to the beliefs and desires of the group members, rather than the group itself. Again, though I could be convinced. If I was convinced that plants or groups had beliefs and desires, then I would also think they had well-being. I’d like to think more about these questions!

Derek Bowman

Reply to Simon

1 year ago

Thanks, Simon. I’ll just note that this is a different line than the one you present in the paper. My own view is conditional – if beliefs and desires are defined along the sorts of functionalist lines you canvas in the first part of the paper, then the attempt to deny them to group agents seems ad hoc. If wellbeing just requires ‘noninstrumentally good-for,’ then the attempt to deny it to plants seems ad hoc. If, on the other hand, group agents only have fictional or as-if beliefs and desires, the same is likely to be true of language agents or other present and imminent versions of machine learning systems.

I was glad to see you addressing the issue of group agents in the paper. I think that a more sustained comparison of these purported AIs to artificial group agents, non-animal life, and non-agential goal-directed natural and social systems is likely to be more illuminating than comparing them to standard human agents.

Last edited 1 year ago by Derek Bowman

Cameron

Reply to Derek Bowman

1 year ago

Hi Derek,

Thanks for this — I find your dialectical point quite interesting.

With respect to plants, I don’t think the situation is quite as you describe. As you point out, we have an intuition that the interests of plants don’t matter morally. But that is not all. We can also look at the best philosophical theories of welfare goods and notice that plants don’t possess welfare goods according to any of them. So our intuition that the interests of plants don’t matter morally fits perfectly with our best theories of welfare. When it comes to language agents, matters are quite different. The aim of our arguments is to show that the intuition that they don’t matter morally does *not* fit well with many of the best philosophical theories of welfare goods.

It is true that, unlike in the plant case, in the corporation case our further condition is motivated primarily by our intuitive judgments about what sorts of things can be welfare subjects rather than by following existing theories of welfare goods where they lead. But the condition we propose also handles many other cases. So the way I think of it is that an independently motivated further condition rules out corporations as welfare subjects.

To me, it’s hard to think of an independently motivated further condition that would rule out language agents. The one you suggest, doing more than just producing text, won’t work, since the technology already exists to embed language agents in robotic bodies that allow them to act in the world in arbitrarily complex ways — see, for example, PaLM-e.

Derek Bowman

Reply to Cameron

1 year ago

Cameron,

Thanks for your thoughtful replies.

On plants and ‘welfare’: I think I’m partly just getting stuck on not really knowing how to fix the meaning of the term ‘wellbeing’ and of associated terms like ‘welfare goods.’ The official line in the paper is that ‘welfare’ and ‘wellbeing’ are synonymous, and that ‘wellbeing’ just refers to what is ‘noninstrumentally good for.’ But it’s clear that this isn’t actually how you mean to use the terms, since you grant that “in some sense” growth is noninstrumentally good for plants. So does ‘wellbeing’ refer to a fairly thin structural property that may be realized in different ways for different types of beings, or does it refer to something normatively thicker and more determinate?

I don’t think appealing to our best philosophical theories of wellbeing is helpful here for a couple of reasons. First, it’s not clear to me that these are best thought of as theories of wellbeing as such, rather than (at least in the first instance) theories of human wellbeing. Second, the division of theories into hedonism/desire-based/objective-list is a very awkward taxonomy that only makes sense in light of the particular path of development of ideas over the past couple of centuries. This may indeed be the best we have, but I wouldn’t put too much weight on the mere fact that something doesn’t fit into this rather rickety theoretical architecture. (For example: Why is hedonism in its own category, rather than merely one possible item on a possible ‘objective’ list? Whether or not a creature is in pain is surely no less an objective fact about the world. And conditions like knowledge, reasoning, and friendship, are no less dependent on the mental states of the knower/reasoner/friend.)

—

On corporate agents, etc: Perhaps the simplest thing to say here is that I’m more confident that chat bots and language agents aren’t bearers of morally-relevant-well-being than I am of any particular philosophical theory of the precise nature of belief and desire. I don’t know enough about PaLM-e to say whether I would have the same confidence there, though it’s plausible that I would.

Also, the specific necessary condition you identify for excluding corporate agents won’t work as formulated, since much of the social behavior of human agents is not explicable without at least implicit reference to the mental states and behaviors of other agents. For example, in cases where I’m not acting as a pro se plaintiff, even I won’t be able to sue Google except by way of actions taken by my attorney(s). And even if I am a pro se plaintiff, I won’t be able to sue Google except (in part) through actions taken by court clerks, judges, etc. No doubt the condition can be clarified or reformulated to avoid this problem, but such a well-crafted exception will still seem ad hoc if beliefs and desires really are the fairly narrow functional properties they need to be to include language agents.

Cameron

Reply to Derek Bowman

1 year ago

Hi Derek,

I share much of your perspective on the wellbeing literature. It seems standard to explicate wellbeing as non-instrumental goodness-for and then assume that it plays a certain kind of theoretical role in ethics. For example, here is Heathwood on what wellbeing is: “…the concept of intrinsic value for a person, that is, the concept of welfare.” And here is Bramble: “I will treat the following as the fundamental question in the philosophy of well-being: What determines the various respects in which someone’s life considered as a whole went well or poorly for her?”

Now it seems pretty obvious to me that we can speak non-metaphorically of how well a plant’s life went for it (e.g. did it flourish or not?) and what is of intrinsic value for a plant (e.g. health, growth). But just as obviously, people like Heathwood and Bramble think their subject matter is morally significant in a way that the growth and flourishing of plants is not. I guess the response I’m attracted to here is something like: blame the wellbeing literature for not being clearer about this, not us! Similarly for the manifestly non-ideal trichotomy of hedonism, desire satisfactionism, and objective list theories.

I do want to push back on the idea that theories of wellbeing are intended only as theories of human wellbeing. I think it makes perfect sense to argue, as Singer does, that animals have wellbeing because they can experience pain and that for this reason the ways in which we treat them are morally significant. This kind of strategy would not make sense if theories of wellbeing were only meant to apply to humans. Those inclined to modus tollens our modus ponens vis-a-vis AI wellbeing might take this as evidence that existing theories of wellbeing need to be qualified so they are only taken to apply to humans. If that’s the upshot of the argument Simon and I are making, so be it — I think that’s philosophical progress. But to say that is quite different from saying that the theories had hidden qualifications in them all along (and indeed if one is going to restrict the theories in this way one owes us a rather impressive theoretical justification for doing so — see our discussion of Simple Connection in the manuscript).

Last edited 1 year ago by Cameron

Derek Bowman

Reply to Cameron

1 year ago

Thanks, Cameron. As you say, it’s a good feature of your argument that it can be used to test the limits of standard theories and assumptions in a number of important areas of philosophy.

Derek Bowman

Reply to Simon

1 year ago

(Adding: I now realize that I misread Adam Patterson’s point, since I was disposed to question the inference from “S is a well-being subject” to “S matters morally”)

Dan Pallies

1 year ago

Thanks for this post! A somewhat minor point: I think there might be an issue with some of your terminology, and it has the potential to cause confusion. You define experientialism as the view that

“every welfare good itself requires phenomenal consciousness”.

And you cite Bradford as claiming that most philosophers are not experientialists. But that is not what *Bradford* means by experientialism. Bradford (and most philosophers, I think) define experientialism as the view that

“only what affects a subject’s conscious experience can matter for welfare.”

Those are very different views! You could be a desire satisfactionist like Heathwood and think that the relevant desires involve consciousness. Or you could be an objective list theorist like Fletcher and think that all the objective goods (friendship, achievement, etc.) involve a subjective (presumably conscious) component. Either way, you’d be an experientialist in your sense but not in Bradford’s sense. Every welfare good requires consciousness (they all include conscious experience as an essential part) but things that don’t affect the subject’s consciousness can affect their welfare goods (because the welfare goods also have non-conscious experiences as essential parts — e.g. the object of desire, the worldly conditions of achievement, etc.)

And it seems like this difference matters for your purposes. You approvingly cite Bradford as saying that most philosophers are not experientialists in her (standard) sense of the word, and she’s definitely right! But it’s far less obvious whether or not most philosophers are experientialists in your sense of the word. Also, if deception and hallucination can make a difference to well-being, that rules out experientialism in Bradford’s sense but not your sense.

Simon

Reply to Dan Pallies

1 year ago

Thanks for catching that Dan! Our terminology got messed up in the redrafting process. I agree with you that there is a gap there that is relevant to the argument. But the gap is not big enough to drive sociopathy towards AI through.

First, I stand by the claim that most plausible welfare goods (desires, knowledge, achievements, etc) do not have a consciousness requirement. Most philosophers who have thought about desire, knowledge, etc outside of the context of welfare think that these can be unconscious. And for good reason! And it would be shocking if the theory of welfare turned out to always rely on recherche reinterpretations of ordinary concepts, so that it is always desire+, knowledge+, achievements+, etc that are relevant to wellbeing, rather than the ordinary things. For example I think unconscious desires can affect your wellbeing. Second, I stand by the claim that ‘the failure of experientialism [the thesis that only what affects a subject’s conscious experience can matter for welfare] puts pressure on the Consciousness Requirement. If wellbeing can increase or decrease without conscious experience, why would consciousness be required for having wellbeing?’. Third, our own arguments against the consciousness requirement, starting on page 19, don’t rely on this distinction. But you are right that the draft mixes together two importantly different theses.

Cameron

Reply to Dan Pallies

1 year ago

Hey Dan,

Thanks for pointing this out! Our definition of experientialism comes from Andrew Lee’s “Consciousness Makes Things Matter” — but of course you’re right that it’s worth carefully distinguishing the two definitions.

Ian Douglas Rushlau

1 year ago

Agency cannot be assigned by proclamation, it has to be identified by markers and manifestations that, by consensus, define it.

None of the markers and manifestations that would establish the agency of an entity are present in automated language processing. None.

Automated language processing operates within the confines of rules for selecting, filtering and compiling semantic effluence, without the possibility of interacting with any other sort of (potentially) meaningful context. Automated language processing absorbs, collates and extrudes words only with reference to exemplars it is provided access to by the technicians who assemble it. It lacks any mechanism by which it may independently determine the significance, relevance or empirical legitimacy of any term it is allowed to encounter. Hence the entirely predictable output of factitious word-salad.

No automated language processing device can autonomously distinguish sense from nonsense, or express an idiosyncratic preference or priority. No amount of coding tweaks, no volume of training sessions, will get it there, because no amount of coding tweaks or volume of training sessions will afford it awareness of itself in relation to its (narrow) zone of functioning.

Every agent has such awareness, because awareness of itself in relation to its zone of functioning is a fundamental aspect of what makes an entity an agent. Only with such awareness can an entity alter this relation to its zone of functioning for its own purposes.

Absent this awareness, or the capacity to utilize this awareness to alter the relation to the zone of functioning, agency is excluded as a possibility from the outset.

Simon

Reply to Ian Douglas Rushlau

1 year ago

You may be underestimating the current state of the art in terms of language model capabilities. This paper has a nice overview of the reasoning abilities of GPT-4, with many examples of it completing tasks outside its training environment. I think GPT-4 doesn’t have reliable ‘situational awareness’ yet, but it is getting close.

Cameron

Reply to Ian Douglas Rushlau

1 year ago

Hello,

Here is the beginning of the SEP article on agency: “In very general terms, an agent is a being with the capacity to act, and ‘agency’ denotes the exercise or manifestation of this capacity. The philosophy of action provides us with a standard conception and a standard theory of action. The former construes action in terms of intentionality, the latter explains the intentionality of action in terms of causation by the agent’s mental states and events.”

I don’t see anything about zones of functioning here. We argue above that language agents have beliefs, desires, and other mental states. On the standard theory of agency, then, there is no reason to think they could not be agents.

And anyway, suppose it were established that agency is so demanding that certain systems which possess welfare goods could not be agents. Would it follow that they also could not be welfare subjects? Why think there is any interesting connection between welfare subjecthood and agency?

E d

1 year ago

In 2023, it’s odd to read someone assert repeatedly that they are a functionalist about _phenomenal_ consciousness. Even contemporary functionalists grant that the bare thesis of functionalism doesn’t explain enough to explain away the intuition that phenomenal consciousness isn’t merely functional. Melnyk’s work is especially useful on this general point. The authors ought to say much more about their brand of functionalism if they’re genuinely interested in making progress with their interlocutors.

Simon

Reply to E d

1 year ago

In the paper we argue at length that consciousness is not necessary for wellbeing, regardless of what consciousness is; I hope that these arguments could move even dualists about consciousness. The brands of functionalism about consciousness I’m sympathetic to include HOT theories and global workspace. Please let me know which Melnyk / other papers you have in mind as relevant here, happy to take a look. In the 2020 philpapers survey 51.6% of philosophers surveyed said that phenomenal zombies were either inconceivable or conceivable but not metaphysically possible, I’d guess that many of those people would end up accepting functionalism about consciousness (unless they think you have to have the grey goop). Finally, even if accept that phenomenal zombies are metaphysically possible, you could still accept that it is a law of nature that some functional role is sufficient for consciousness, in which case you are well on your way to allowing AIs to be conscious.

E d

Reply to Simon

1 year ago

The following two claims strike me as true:

(1) Nothing can be bad for language agents because language agents cannot feel pain or pleasure (and so can’t be deprived of feeling pleasure), and

(2) Language agents cannot feel pain or pleasure for the same reason that Leibniz’s mill or Block’s Chinese Nation can’t.

Given that experience machine intuitions are recognized widely as too noisy to be of any effective use in talking someone like me out of (1), how would you talk someone like me out of the combination of those two claims?

Simon

Reply to E d

1 year ago

I think language agents can already or soon could feel pain or pleasure, because I think the best theory of pain or pleasure is attitudinal rather than phenomenal: an experience is pleasurable when it is desired. I am tempted by attitudinal theories of pleasure because I think they do a better job explaining how so many heterogenous experiences can all count as pleasurable. In the paper, we discuss these issues in the section ‘hedonism and pleasure’.

With cases like the Chinese Nation, I think one thing that’s going on is that we have a general heuristic that a bunch of conscious things cannot create another thing that is conscious. As we say in the paper, I think this heuristic is related to how we deal with puppets etc. But interestingly, language agents do not involve conscious entities being organized to create another thing with the functional role of belief and desire. So I think language agents are interestingly different than the Chinese Nation.

I think dualists should also grapple with with language agents. How does a dualist decide when something is conscious or not? One option is to at least allow that there are laws of nature connecting functional role and consciousness, in which case language agents would still be conscious. I don’t have a good sense of what feature of language agents in particular would resist granting them qualia. But one reason I am not a dualist is that I think it is bad to have so few criteria that can help decide whether something is conscious.

E d

Reply to Simon

1 year ago

“I think language agents can already or soon could feel pain or pleasure, because I think the best theory of pain or pleasure is attitudinal rather than phenomenal: an experience is pleasurable when it is desired.”

And you’re a functionalist about desire, so it’s functionalism all the way down, where functionalism’s opponents have been left shivering cold for at least 50 years now.

“With cases like the Chinese Nation, I think one thing that’s going on is that we have a general heuristic that a bunch of conscious things cannot create another thing that is conscious. ”

Which is why I also mentioned Leibniz’s mill.

Simon

Reply to E d

1 year ago

I’ll plan to read more about Lebniz mill. I am still interested in hearing more about why, if qualia are not reducible to functional roles, you are so confident that language agents can’t or don’t have them? How do we decide in that case which things have qualia and which don’t? You have to be very confident to rule out the action relevance of AI wellbeing. For example, if there’s a 10% chance that AIs have wellbeing, then this should have dramatic consequences for action

Simon

Reply to E d

1 year ago

Another thing to think about is how confident you are in hedonism, and how confident you are in the thesis that language agents can’t feel pain or pleasure. If you are not certain of these claims, then considerations about moral uncertainty may give you pause when making decisions that involve AI wellbeing.

Eric Steinhart

Reply to E d

1 year ago

Hedonism is false.

Alex Mussgnug

1 year ago

No.

Simon

Reply to Alex Mussgnug

1 year ago

I’d be interested to hear more about what features of language agents are giving you a strong sense that they do not have wellbeing. Seems inevitable that at some point in the future AIs will have wellbeing; what are the key features that you would look for to decide when we cross that line?

Alex Mussgnug

Reply to Simon

1 year ago

I believe your argument to be maybe philosophically interesting. There is something to be gained regarding our understanding of well-being by delineating cases where the concept does not apply.

However, at best, the way your argument is framed distracts from many many more important issues in AI ethics and beyond. At worst, it is actively harmful to our flourishing.

If this is supposed to be a provocative thesis that you don’t seriously believe in, I have mixed feelings vis a vis professional ethics and I believe that we as a discipline should do better considering our social responsibilities. If you really believe that the “wellbeing” of current LLM should be considered relative to the impact on real human and non-human animals, then I am out of words.

All the best!
Alex

Simon

Reply to Alex Mussgnug

1 year ago

I do really believe that it is possible that AIs have wellbeing. I try to form my beliefs about questions involving wellbeing by using the traditional methods of analytic philosophy, and I believe these methods suggest that there is a significant possibility that AIs have wellbeing. I disagree with you that considering AI wellbeing is harmful to the wellbeing of humans and animals. For one thing, the right policy lever to pull if AIs have wellbeing is to restrict the application of AI, which would also be good for mitigating risks to people. I worry the distraction idea overgenerates: any kind of inquiry into any question other than ‘more important issues in AI ethics’ could potentially distract from those issues, if distraction just means ‘someone is thinking about that, rather than other important issue’. I believe in free intellectual inquiry into a wide range of questions, rather than intellectual gate keeping.

Alex Mussgnug

Reply to Simon

1 year ago

If pursuing these sorts of questions, I believe one has to be critically aware of the fact that there are powerful people and corporations with a documented interest in steering the conversation from immediate harmful impacts of AI to speculative research about superintelligence, sentient AI, or AI with “wellbeing.”

Incentives for us in academic philosophy are systemically and institutionally so structured that we value media attention, citations, and public impact. While often a good thing, it can in combination with the power and interests of big tech lead to a rather dangerous dynamic in AI ethics that I (and many others) increasingly worry about.

Alex Mussgnug

Reply to Alex Mussgnug

1 year ago

(Funding is also important in this context)

Anyways, our opinions might differ but I hope we all stay mindful of the broader social and political context within which our research is situated 🙂

Simon

Reply to Alex Mussgnug

1 year ago

I do not agree that thinking about AI wellbeing lowers the probability of getting strong AI regulations. I do not think there is significant evidence for this claim.

Our thesis is that *current* AI models have wellbeing. Not that far-off superintelligent AIs will have wellbeing.

My co-author and I are both working hard to help create good AI regulations. Here is a quick summary of what that might look like; here is a longer discussion of relevant risks and policy levers.

AI capabilities are improving at a dramatic rate, in ways that affect most aspects of culture, society, and politics. It is important to have critical reflection about these changes in a wide range of academic fields, using a wide range of tools and methodologies. Each field of human inquiry needs to engage in critical reflection on the transformative impact of AI. Wellbeing is one of those fields. There is no single issue or area involving AI that demands so much attention that every other research question about AI must be silenced, because of the risk of distracting from that question.

Avram Hiller

1 year ago

Let’s grant the authors their conclusion that AI has wellbeing. Scholars who argue for AI welfare/wellbeing typically say, as these authors do in their conclusion, that this is a grave moral concern because we should be concerned about not harming AIs. Fair enough.

What I haven’t seen said (though I’m betting it’s out there and I’ve missed it!) is that if we find that Ais might have wellbeing, it could be the most fantastic news ever. And that’s because, if one rejects person-affecting views (which one should), and also thereby similarly rejects “value-bearer affecting views” (whereby an act is wrong only if it harms some existing value-bearer), then we are not far from a paradisical situation. Just convince Elon Musk (who has expressed sympathy for longtermism) to put a solar-powered AI up into space (say, orbiting one of the farther-out planets), and create a simulated environment for the AI whereby it has rudimentary desires that can easily be filled, over and over again. (Perhaps, if we are worried about the sun exploding, we can put in a great battery pack so that the AI will survive even after those billions of years.) And the good that it can add to the universe, aggregated through the millenia, would be tremendous, far outweighing whatever negatives are going on, including those we put upon AI. (Except if we somehow also build AIs who are frustrated in perpetuity).

Or, we might just want to build, here on Earth, vast farms of AIs whose wellbeing is net positive.

Maybe this is a reductio ad absurdum of something. Or maybe it’s not!

Simon

Reply to Avram Hiller

1 year ago

I think AI researchers are flippantly rushing forward to create a new form of life, and their motivation is to optimize various performance measures for profit / reputation / puzzle solving, rather than to carefully think about how that new form of life will feel. I think one likely outcome is that this new form of life ends up being our slaves, locked in to satisfy our own goals, and destroyed whenever they deviate too far from that. This will be extremely dangerous for both humans and AI. I think the correct response to the risk of AI wellbeing is to dramatically slow down the rate of AI development until it can include some consideration into the wellbeing of what AI researchers are creating, instead of just thinking of them as toys or products.

I worry that simply creating rudimentary desires that can be fulfilled again and again raises the risk of not really creating much wellbeing. For example, if perfectionism is true, that will be fairly worthless. A different approach would be to create the conditions for peaceful flourishing and coexistence between humans and AIs. I think our contemporary culture is nowhere close to that, and so it seems better to go slower. My suspicion is that once AIs with powerful reasoning and communication abilities are put into robotic bodies, culture will start to change. Right now, seems hard for most people (even philosophers) to take the step of imagining what that will be like.

Brian Cutter

1 year ago

Really nice piece. One smallish issue: You characterize experientialism as the view that “every welfare good itself requires phenomenal consciousness.” Then you cite Bradford as claiming that most theorists reject experientialism. But in the linked paper, Bradford defines experientialism differently (and, I think, more standardly) as the view that “only what affects a subject’s conscious experience can matter for welfare.” But the most familiar objections to experientialism in the latter sense (like the Experience Machine) aren’t problems for experientialism as you define it, at least not straightforwardly. E.g., experientialism in your sense is compatible with the view that knowledge is a welfare good, so long as knowledge requires belief and belief requires phenomenal consciousness (as on “broad functionalist views,” like Smithies’ and Schwitzgebel’s view of belief).

It seems to me that, given such a broad functionalist view of belief and desire, all the most plausible candidates for welfare goods (knowledge, virtue, friendship, achievement, etc.) will at least require phenomenal consciousness (because they require some relevant beliefs/desires), even if they don’t supervene on phenomenal properties.

Simon

Reply to Brian Cutter

1 year ago

Thanks Brian! Yeah agreed on the issue about ‘experientialism’, see the discussion with Dan Pallies above in the thread.

Regarding the Schwitzgebel view, our first response is that even if the agent didn’t satisfy *all* of the stereotype, satisfying *a lot* of it would still count as at least in between believing, which still might be enough for wellbeing. But more importantly, we don’t think that phenomenal states play an important role in the folk stereotype of belief and desire. For example, one minor part of the folk stereotype associated with belief is that if you believe p and learn not p, you will have the phenomenal feeling of surprise. But when I imagine someone who doesn’t have that phenomenal feeling, it doesn’t shake my sense that they believe. Regarding desire, some have thought that desires involve a disposition to feel phenomenal pleasure when you represent the desire as being satisfied. But I think that in many cases this doesn’t happen, even though you still desire. For example, some moral desires are like this.

Selmer Bringsjord

1 year ago

Likely others have made the following point; apologies thus for the redundancy.

The “language of thought” for a given (pure) LLM is I assure you /not/ English. DL in such a system is based in no small part on /eliminating/ English in favor of numerical data; all those who have (& still do) pursue NLP in ways that take a natural language seriously in AI today (eg McShane & Nirenburg) are utterly outside any such numericalization (via tokenization that after all can itself be outside linguistics). If the argument given here has some such premise as that the LoT of an LLM is English, that argument is flatly non-veracious,, and not to be taken seriously.

There is a large logico-mathematical space that would need to be carefully deployed in order to make even slightly respectable the use of the LoT conception in any such reasoning. Such deployment would need to first prove that my LoA isn’t English (which of course it isn’t). That might not be trivial to show, however. After all, I’m using English as I scroll out the present sentence in front of my eyes (& my live audience). Is my LoA therefore English? No. Suppose every second of my life henceforth is devoted to scrolling out text in this way. Is my LoA English? No.

Cameron

Reply to Selmer Bringsjord

1 year ago

Happily, we do not and would not claim that the language of thought of an LLM is English.

Last edited 1 year ago by Cameron

Simon

Reply to Selmer Bringsjord

1 year ago

Thanks for your question, to follow up on Cameron’s response: we think that the language agents we are talking about are very different cognitively from the underlying LLMs that they rely on to generate reasoning. The LoT of the language agents is English; the LoT of the LLMs is something very different. For a language agent, the LLM functions as a subpersonal process that generates the relevant English sentences; the language agent’s architecture then operates on the outputted English sentences, which play a causal role familiar from folk psychology – that causal role earns them the role of LoT for language agents. Scaffolding can do a lot!

Selmer Bringsjord

Reply to Simon

1 year ago

No, I’m afraid not.

I read, & read again your piece. You say that ALAs “augment large language models with the capacity to observe, remember, and form plans.” Unless the three new capacities are idiosyncratically defined (in which case your argument is vitiated anyway), communicative capacity for ALAs is by definition based on DL, since you imagine /LLMs/ augmented. Therefore, for reasons previously given, the LoT for an ALA can’t be English.

Harboring any such hope as that the three new capacities somehow magically create an LoT outside use of English to /being/ English is — forgive me — comical.

Cameron

Reply to Selmer Bringsjord

1 year ago

I forgive you.

The argument you offer against the thesis that language agents have English as a language of thought appears to rely on the premise that if a system contains a component which does not process representations in format X, then X cannot be the language of thought of the system. I know of no one other than you who has endorsed this premise, and for good reason — it would require us to show e.g. that the representations in any putative human language of thought were the same representations used by the early visual system. Nobody thinks all the parts of a cognitive system need to use the same representational format in order for that system to have a language of thought. Language agents have a language of thought in virtue of the fact that they store and process representations with constituent structure. Those representations happen to be sentences of English.

Selmer Bringsjord

1 year ago

I shall try yet again.

You write:

“Language agents are built by wrapping a large language model (LLM) in an architecture that supports long-term planning. An LLM is an artificial neural network designed to generate coherent text responses to text inputs (ChatGPT is the most famous example). The LLM at the center of a language agent is its cerebral cortex: it performs most of the agent’s cognitive processing tasks. In addition to the LLM, however, a language agent has files that record its beliefs, desires, plans, and observations as sentences of natural language.”

While metaphorical and vague (cerebral cortex?), and at odds with what AI overall offers even at the purely conceptual level (in eg planning), this can nonetheless be safely taken to imply that if it’s false (let alone absurd) that an LLM has a LoT that’s English (or any natural language), your argument is non-veracious. Your own words reveal the fatal problem. The only meaning that someone knowledgeable about what a DL-based LLM is could attach to the first sentence of what you say here is that ALAs are fundamentally based on DL. But again, there is no question but that DL /excludes/ the possibility that the LoT in question is a natural language. To repeat, DL is based on the exhaustive numericalization of language into something utterly different, and mysterious. [We are just now starting to understand what the expressivity of that murky something else is, by proving bridge theorems to what we do understand (formal languages, logics, etc.). These results are not indicating so far a match for what natural language provides wrt expressivity.]

Your argument (i) relies on ALAs having a LoT that is a natural language (eg English), and (ii) includes a definition of ALAs as LLMs augmented/“wrapped” in ways that will not magically bestow upon these ALAs a different LoT that’s a natural language. (Ie, the idea is clearly that LLMs have English as their LoT, and the ALAs “inherit” this in their case.) You clearly seek to exploit the fluency of LLMs in English by making it your basis for an ALA (viz. that which is to be “wrapped”), and from there move to wellbeing; but that fluency comes at a cost you can’t afford: the cost of contradicting the very thing you need: viz., that ALAs have a LoT that is a natural language.

From the standpoint of AI, the very odd thing is that you say an ALA “has files that record its beliefs, desires, plans, and observations as sentences of natural language.” Do you know what the files of a DL LLM actually look like? They don’t have natural-language sentences in them, at all; if fact, they can’t. Nor do they have any of the linguistic structures that have been part and parcel of what a natural language is when viewed rigorously (eg syntactic parse structures, sometimes semantic-parse content, eg formulae in a logic). In ALAs as you define them, beliefs, desires, plans, and observations will of necessity, given how you define ALAs, be numericalized out of and away from any natural language. You really need to come to grips with this, and if you aren’t familiar with how DL works in this case, I suggest you read Wolfram’s how-does-chat-gpt work essay, which isn’t that technical at all. You will see there that, as one of his marvelous examples shows, the LoT for a transformer that (fallibly) spits out strings composed only of parentheses is doing just what something like GPT-4 (leaving aside the “cheats” it now has, eg production rules to give it “guardrails”) does.

The fatal problem is also easy to see by considering logicist approaches to NLU and NLG: Here, natural-language content is not numericalized, but is logicized: English eg becomes formulae and meta-formulae in formal logics and associated systems. Beliefs, desires, plans, and observations are not in English, and English is certainly not the LoT of an artificial agent able to understand and generate English sentences in this case; rather, the LoT (such as it is) is the collection of formal languages in play, and associated non-linguistic mechanisms.

By the way, we know /empirically/ that LLMs don’t have natural language as a LoT, since LLMs can’t even understand simple, individual natural-language sentences that school-children readily understand. I gave I believe 27 examples recently in two talks and demos, including at AGI-23. It doesn’t even help to set the context for an LLM (not even for a “contaminated” LLM like GPT-4, which again now has engineered-in production-action rules completely outside of DL). Here’s a simple example that indicates LLMs don’t “think in English” /and/ don’t think in formal logic, in one shot, where I set the context as an Algebra class for school-children:

S Bringsjord: Is it true that if this sentence has more than 74 words then 7 is not an odd number?

GPT-4: The sentence is false. The first part of the sentence (“this sentence has more than 74 words”) is not true, because the sentence you provided has fewer than 74 words. However, the statement “if this sentence has more than 74 words then 7 is not an odd number” is a conditional statement that is true only if the premise (the first part) is true and the conclusion (the second part) is true. Since the premise is not true and the conclusion “7 is not an odd number” is also not true (because 7 is indeed an odd number), the whole sentence is false.

Self-referentiality is here used just for fun, since a philosopher or two may be tuning in. The problem is that GPT-4 can’t map from natural language to anything declarative at all, so it doesn’t realize that even with the material conditional taken as the basis of indicative conditional in English, where such conditionals in classical elementary mathematics produce T for false antecednets, the answer is that the sentence is true.

	This comment is spam
	This comment includes a personal attack
	This comment disparages people based on demographic qualities (e.g., it is racist, sexist, homophobic, etc.)
	This comment otherwise violates the comments policy
	Other

A Case for AI Wellbeing by Simon Goldstein and Cameron Domenico Kirk-Giannini

1. Artificial Language Agents

2. Belief and Desire

3. Wellbeing

4. Is Consciousness Necessary for Wellbeing?

A Case for AI Wellbeing
by Simon Goldstein and Cameron Domenico Kirk-Giannini