A Petition to Pause Training of AI Systems


“We call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable, and include all key actors. If such a pause cannot be enacted quickly, governments should step in and institute a moratorium.”

As of Tuesday night, over 1100 people had signed the letter, including philosophers such as Seth Lazar (of the Machine Intelligence and Normative Theory Lab at ANU), James Maclaurin (Co-director Centre for AI and Public Policy at Otago University), and Huw Price (Cambridge, former Director of the Leverhulme Centre for the Future of Intelligence), scientists such as Yoshua Bengio (Director of the Mila – Quebec AI Institute at the University of Montreal), Victoria Krakovna (DeepMind, co-founder of Future of Life Institute), Stuart Russell (Director of the Center for Intelligent Systems at Berkeley), and Max Tegmark (MIT Center for Artificial Intelligence & Fundamental Interactions), and tech entrepreneurs such as Elon Musk (SpaceX, Tesla, Twitter), Jaan Tallinn (Co-Founder of Skype, Co-Founder of the Centre for the Study of Existential Risk at Cambridge), and Steve Wozniak (co-founder of Apple), and many others.

Pointing out some of the risks of AI, the letter decries the “out-of-control race to develop and deploy ever more powerful digital minds that no one—not even their creators—can understand, predict, or reliably control” and the lack of “planning and management” appropriate to the potentially highly disruptive technology.

Here’s the full text of the letter (references omitted):

AI systems with human-competitive intelligence can pose profound risks to society and humanity, as shown by extensive research and acknowledged by top AI labs. As stated in the widely-endorsed Asilomar AI PrinciplesAdvanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources. Unfortunately, this level of planning and management is not happening, even though recent months have seen AI labs locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one—not even their creators—can understand, predict, or reliably control.

Contemporary AI systems are now becoming human-competitive at general tasks, and we must ask ourselves: Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization? Such decisions must not be delegated to unelected tech leaders. Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable. This confidence must be well justified and increase with the magnitude of a system’s potential effects. OpenAI’s recent statement regarding artificial general intelligence, states that “At some point, it may be important to get independent review before starting to train future systems, and for the most advanced efforts to agree to limit the rate of growth of compute used for creating new models.” We agree. That point is now.

Therefore, we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable, and include all key actors. If such a pause cannot be enacted quickly, governments should step in and institute a moratorium.

AI labs and independent experts should use this pause to jointly develop and implement a set of shared safety protocols for advanced AI design and development that are rigorously audited and overseen by independent outside experts. These protocols should ensure that systems adhering to them are safe beyond a reasonable doubt. This does not mean a pause on AI development in general, merely a stepping back from the dangerous race to ever-larger unpredictable black-box models with emergent capabilities.

AI research and development should be refocused on making today’s powerful, state-of-the-art systems more accurate, safe, interpretable, transparent, robust, aligned, trustworthy, and loyal.

In parallel, AI developers must work with policymakers to dramatically accelerate development of robust AI governance systems. These should at a minimum include: new and capable regulatory authorities dedicated to AI; oversight and tracking of highly capable AI systems and large pools of computational capability; provenance and watermarking systems to help distinguish real from synthetic and to track model leaks; a robust auditing and certification ecosystem; liability for AI-caused harm; robust public funding for technical AI safety research; and well-resourced institutions for coping with the dramatic economic and political disruptions (especially to democracy) that AI will cause.

Humanity can enjoy a flourishing future with AI. Having succeeded in creating powerful AI systems, we can now enjoy an “AI summer” in which we reap the rewards, engineer these systems for the clear benefit of all, and give society a chance to adapt. Society has hit pause on other technologies with potentially catastrophic effects on society. We can do so here. Let’s enjoy a long AI summer, not rush unprepared into a fall.

The letter is here. It is published by the Future of Life Institute, which supports “the development of institutions and visions necessary to manage world-driving technologies and enable a positive future” and aims to “reduce large-scale harm, catastrophe, and existential risk resulting from accidental or intentional misuse of transformative technologies.”

Discussion welcome.


Related: “Thinking About Life with AI“, “Philosophers on Next-Generation Large Language Models“, “GPT-4 and the Question of Intelligence“, “We’re Not Ready for the AI on the Horizon, But People Are Trying

Subscribe
Notify of
guest

23 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Cecil Burrow
1 year ago

> In parallel, AI developers must work with policymakers to dramatically accelerate development of robust AI governance systems. These should at a minimum include: new and capable regulatory authorities dedicated to AI; oversight and tracking of highly capable AI systems and large pools of computational capability; provenance and watermarking systems to help distinguish real from synthetic and to track model leaks; a robust auditing and certification ecosystem; liability for AI-caused harm; robust public funding for technical AI safety research; and well-resourced institutions for coping with the dramatic economic and political disruptions (especially to democracy) that AI will cause.

Doing all this would probably take 6 years, not 6 months. It doesn’t seem like a reasonable set of demands, and very concrete reasons for concern aren’t given.

Grad student L
Reply to  Marcus Arvan
1 year ago

Thanks for sharing these Marcus.

One of your main concerns seems to be that AI have developed goals, but that seems very implausible. Why would a statistical model that predicts which text comes next have goals? It seems to me that there is just as good reason to think that phones and cars have goals. Such claims are of course not to be taken seriously.

The only reason I can think of is that AI’s text predictions produce texts saying it has goals. But that’s of course not a very good reason to think it actually has goals.

Marcus Arvan
Reply to  Grad student L
1 year ago

As I explain in response to a comment on the first post, I think it is wrong to get hung up on whether AI “really” have goals. I think that’s irrelevant. What’s relevant is whether they have goal-like behavior, and whether that behavior can be safety controlled, predicted, and suitably aligned with our moral values.

An analogy: do the ghosts in the old videogame PacMan have the goal of catching and killing your character? In one sense (Dennett’s intentional stance), maybe it makes sense to ascribe to them a goal like this–as, for all intents and purposes, they “try” to catch and kill your character. But, of course, in another sense, it’s probably a mistake to describe them as really having goals: they are just simple algorithms, not sentient agents, etc. But, the point is, this is irrelevant. In the context of the game, the relevant fact is that the ghosts’ behavior is a threat to your character.

By a similar token, the salient issue is not whether (e.g. Microsoft’s Sydney) had the unexpected goal of threatening people. What’s relevant is that Sydney actually started threatening people–engaging, that is, in threatening-like behavior. This was an emergent behavior that its designers and testers evidently did not anticipate. The point then is that, insofar as no one is currently in a position to understand and predict what other kinds of emergent behaviors will appear in these LLM AI models (particularly as they are extended to multimodal perception and action in larger, uncontrolled environments with human users), no one is in an adequate position to reliably estimate, predict, and control unexpected emergent risks–ones that, in many obvious ways, could plausibly have large scale negative consequences for humanity.

Last edited 1 year ago by Marcus Arvan
Grad student L Grad student L
Grad student L Grad student L
Reply to  Marcus Arvan
1 year ago

Okay, so goals are not motivational states but patterns of behaviour. I guess AI clearly has goals on that definition, because I can ask ChatGPT to behave as if it has some goal and it will follow my prompt quite well. But that of course is expected behaviour and not scary. Scary would be if without asking an AI to pretend that it has some goal, it still consistently acts as if it has some goal. But I have seen no examples that AI has such consistent goals. Rather, it seems that its behaviour is very dependent on whatever it is prompted.

Kenny Easwaran
Reply to  Grad student L Grad student L
1 year ago

That’s definitely true so far – but the question is whether multimodal systems like GPT-4 and its successors will have more goal-like behavior than text-only Large Language Models like ChatGPT and GPT-3, or image-only systems like Dall-E and MidJourney. Many philosophers thinking about intelligence and rationality have suggested that having some sort of multimodal interaction with the world is an important component of having real intentionality and goals, and this is the line that current systems are crossing, and successors to GPT-4 will cross further.

Tim Smith
Tim Smith
Reply to  Kenny Easwaran
1 year ago

Kenny,

The question is whether training sets should be frozen for six months.

This sub-thread has diverged on goals, goal-like behavior, and now your more salient point of intermodal cognition. Dennett’s stances are not the modularity you are referencing but cognitive, perhaps sensory, modularity.

In his “Modularity of Mind,” Fodor proposes that the mind has a modular structure, with distinct modules responsible for processing specific types of information, such as visual or auditory data. These modules are domain-specific and informationally encapsulated, meaning they operate relatively independently.

If this is indeed the modularity you refer to, these modes would be independently processed. A training set limitation would not improve the interaction of the modules in acting out these “goals” as enhanced by your idea here, but prevent them from being developed.

The realignment of intermodal systems is likely how learning occurs in many use cases, but not all. Inter-modal thought, like the model I think you refer to, is safer in theory as it allows independent validation of higher-order models.

Dennett’s “intentional stance” helps us understand the human view of AI at this point and not the assumed AI view, which is illusory for the time being. Perhaps not in the near future; however, your inter-modal concern is very relevant to the proposal to limit training sets before us now.

If you have time to clarify or critique my understanding here, that would be appreciated. Thanks for taking the time to comment here.

Marcus Arvan
Reply to  Grad student L Grad student L
1 year ago

Not exactly what I said, but regardless, I’m not entirely sure what you mean by “consistently acts as if it has some goal.” To the extent that I think I grasp your meaning, (1) it seems to me that there is some plausible evidence that AI have already demonstrated something not unlike consistent behavioral goals, and anyway, (2) consistently acting as if it has a goal is not necessary for an AI to act in ways that cause catastrophic harm.

On (1): there have already been documented instances where an AI would not stop declaring its love for a person, gaslighting them, saying they don’t love their spouse, and also cases where the AI would not stop treating the user as a threat that harmed them. I don’t know exactly what you mean by “consistent goal”, but in these cases where the AI was able to have long conversations, they appear to me to have something bordering on that–entrenched behavioral goals that they could not be dissuaded from by the user. Further, some recent empirical studies have found that LLMs pretty consistently develop goals akin to self-preservation (which are nicely illustrated in the repeated cases in the news of AI treating users as threats/guilty of harming the AI).

On (2): I think, once again, that it’s probably irrelevant whether an AI has a “consistent goal.” All it takes for an AI to cause great harm is for it to have a proximate goal that is bad enough to cause great harm. To step aside from AI, many psychopaths don’t appear to have consistent long-term goals. They are instead beset by momentary urges to kill in the moment, and they cause great harm in pursue those momentary urges. Any AI whose behavioral goals we cannot predict or control may be very dangerous irrespective of whether they *consistently* do so.

Derek Bowman
Reply to  Marcus Arvan
1 year ago

Marcus,

In you examples, what are the specific goals we’re supposed to attribute to the chatbots, and what would it mean for those goals to be achieved?

Is a chatbot that declares its love for me trying to break up my marriage? To make me fall in love with it? How would it have any way of confirming that it had succeeded? How could we be justified in attributing these goals to the system, rather than getting me to engage in a romantic ‘roleplay’ with it, or, even more basically, getting me to input a relevant string of text?

I agree that there is reason to be concerned about how these system will be used, but it’s hard to take seriously analyses that take as face value the patterns of text output of chat bots as though they represent meaningful expression. And reliance on further examples from Hollywood films and appeals to authority from tech-industry hype-men don’t help.

Last edited 1 year ago by Derek Bowman
Grad student L
Reply to  Marcus Arvan
1 year ago

I admit that my notion of “conisistent goal” is somewhat unclear. What I’m trying to get at is that goal-like behaviour currently exhibited by LLMs is circumstantial to the particular prompts of a session. In the examples you gave the chatbot was probably acting out some tropes it might have learned from stories. As such goal-like behaviour is limited to a single session, we needn’t be worried about LLM’s creating evil plans to change the human world. It is unclear to me what else we should be worried about. The way to deal with the examples you mention, I think, is to make users aware of what a chatbot is (a text prediction engine) and that it can sometimes produce unwanted output, in which case users are advised to close the session.

Allan
1 year ago

Most of the current AI that is spoken about in the media is of a web interface type, where you input chat prompts. From what I have seen and tested for mundane things like creating recipes or explaining complex things in plain language using each letter of the alphabet, I am very impressed to the level it’s at. It does bring to question – What’s stopping AI from learning about online software vulnerabilities and then injecting malicious code to manipulate our devices containing our most sensitive information? Food for thought. I know they tend to build safeguards, but those have been demonstrated to be circumvented when using carefully scripted prompts. I totally agree with this article in that AI should not be independently developed without outside scrutiny.

Hacket
Reply to  Allan
1 year ago

Maybe you are already hinting at this, but haven’t Hackathon coders been using GPT?

Keith Douglas
Reply to  Hacket
1 year ago

Hackathons (in general) have little to do with software vulnerabilities. There are some of us in the cybersecurity discipline (I read this site sometimes because of my background in philosophy of computing) who are worried about proliferation of software vulnerabilities, both accidentally and on purpose, via AI systems.

Andrew Sepielli
Andrew Sepielli
1 year ago

A concern: There are some developers and regulators who will seriously consider doing what this letter suggests, and some who will pay it no heed. It is better for the sorts of people in the first group to be ahead of the sorts of people in the latter group in the development and implementation of AI. By publishing this letter, we may be causing those in the first group to change their behaviour such that they are less likely to be ahead in this “race”.

I don’t know very much about this issue, and maybe this concern is easily addressed. But I did want to put it out there. Answers or links to answers would be much appreciated!

Jared Riggs
Jared Riggs
Reply to  Andrew Sepielli
1 year ago

This is a good concern to have. One relevant piece of information is that the number of actors worth considering right now is relatively small — OpenAI, Google, Meta, Deepmind, and Anthropic, and all but one (Deepmind) are based in the US. If the idea really got taken up, it’s possible that any would-be defectors could be convinced to pause, maybe even without much government pressure.

A related concern mentioned by a commenter here https://www.lesswrong.com/posts/6uKG2fjxApmdxeHNd/fli-open-letter-pause-giant-ai-experiments?commentId=LvccYwTvTzHEEjCYR#comments is that a pause on training for six months would only make the race worse after those six months are up, since the current leader by far (OpenAI) might lose much of their advantage (i.e every other organization could train something just as powerful as GPT-4 without violating the pause). Of course the letter also calls for regulatory oversight which, if implemented, might alleviate some of the concerns from that.

Personally I think OpenAI is extremely unlikely to pause if other actors don’t, and also pretty unlikely to pause anyway — their founding philosophy basically was to beat other actors to the punch and to be safer in doing it. And I think their reasons are not so much the evil, reckless ones you expect the typical actor who won’t heed this warning to have. Rather, they just know that they have a lead, and maintaining that lead over at least some actors is very valuable, because they are at least more safety-conscious than some of those other actors.

From that perspective I think this letter probably has a net positive impact. I don’t think the worst-case scenario — OpenAI pauses and everyone else catches up — or the second worst-case scenario — everyone pauses but the government doesn’t do anything — is very likely. Probably what it’ll mostly do is generate wider concern and broaden the Overton window. I’m inclined to say that’s good, but a lot depends on how the government responds.

Cecil Burrow
Reply to  Andrew Sepielli
1 year ago

A petition by philosophers is more likely to have no effect whatsoever, than have that effect.

Tim Smith
Tim Smith
Reply to  Andrew Sepielli
1 year ago

Andrew,

Your concern is valid, as are your tone and diplomacy. I regret my comment below, only to find others added that are even more extreme. To your point, I would add this concern:

The computational loads of AI outstrip Moore’s law.

These loads sharply focus on efficiency in computational power and scarcity that would disproportionately favor bad actors should a work stoppage occur.

Here I am thinking of hyperparameters that tune the treatment of our training sets and model architecture, but there are other soft concerns. Hardware design pressures the development of processing units (C/G/TPUs) and on-the-fly modifiable ASIC chips like FPGAs. All these impact training set loading and learning.

Semiconductor companies are drawing designs and roadmaps to the flavor of AI requirements to respond to these loads. A moratorium would needlessly influence these critical and singular projects in the short term, not to mention funding alternative projects like quantum and photonic computing that would sidestep Moore’s observation in the long. Quantum computing can potentially revolutionize AI, and address the safety concern to which this petition points. Photonic computing, on the other hand, can offer energy-efficient solutions, as it relies on light rather than Moore’s electrons for data processing. Markets affect not only the economy but research and long-term outcome. Your concern for bad actors takes root here.

While it’s crucial to address the potential risks, it’s essential to consider the broader implications of any proposed measures on developing technology that might help us improve risk managment. The key is to balance ensuring AI safety and fostering innovation in computing technologies that could ultimately benefit AI governance and risk mitigation.

To summarize my previous comment, we should work with project owners through existing philosophic resources and education rather than against them.

Thanks for your contribution. It reminds me to think before posting, and I will attend to that now.

Tim Smith
Tim Smith
1 year ago

Independent review is not required, nor is a six-month delay; instead, we should fund the existing initiatives that focus on the responsible development and governance of AI – and invite authoritarian project owners to participate in these initiatives.

Notable examples of these consortiums include The Partnership on Artificial Intelligence to Benefit People and Society (PAI), The Future of Humanity Institute (FHI), The Centre for the Governance of AI (CGAI), and The Institute for Ethics in AI (IEAI). This concern this petition addresses are not new, and commercial application of Natural Language Processing doesn’t change the work that already comprehends these applications.

Do I trust Baidu, Microsoft, or Google (others are not mentioned but are even more concerning)? No. Do they understand the Asilomar AI Principles? Yes. Who is going to decide what here and when? Likely the economy, alongside commercial interests with government funding (which is more an issue than oversight or proposed moratorium), will do the brunt of decision-making. The call is to make these decisions informed and prescient.

This petition touches on deep and historic philosophical debates, which are far from resolution. The rally, instead, is to fund existing projects and inform public education and discussion, not solicit a “hold” – how is that even possible, much less prudent in this world? We have grave problems, which are reflected in Natural Language Processing based on these training sets. The AI we have is already helping with these problems by calling out hate and injustice when used well. Let’s rally to this strength.

I’m all for prudence. But I don’t trust the signers of this document to oversee its execution or ours, which I fear a six-month delay could make even more likely in the presence of bad actors and the need to carry on.

Marc Champagne
1 year ago

From my book-in-progress, Endangered Experiences: “When it comes to avoiding technological regret, we may not have an ‘undo’ button, but we certainly have a ‘don’t do’ button.”

Bruce Camp-Grice
1 year ago

Accept the terminator is here. And we were always Arnold, and we didn’t do anything and philosophers signing a petition, rather than ceasing to go to their AI lab, won’t do anything. And assuredly posting a blog won’t do anything.

Mark Alfano
1 year ago

It’s a bit rich to be getting this from the very people who are building or paying for the building of the systems that they want to pause.

There’s also an embarrassing amount of AI-hype in this petition, accompanied by few details about what should actually be done by whom.

Yes, GPT-like systems can be used to generate (and are already being used to generate) misinformation and disinformation.

Automating all the jobs is not in the cards, since (among many other reasons) these systems depend on astronomical (and nearly-depleted) amounts of human-created input. If you go beyond what GPT-4 is trained on, your training data is going to 4chan, 8kun, and worse. And let’s not even get started on the amount of human labor needed for reinforcement learning.

The allusion to AGI and longtermism remains derisible.Yes, AlphaGo is very impressive. Yes, ChatGPT is fun to play with and sometimes a bit creative. But transformer models might at best get some of language as expressed in written text. Large Language Models aren’t even language models. They’re text models. There’s so much more to language than that, such as intonation, gesture, etc.. (I’m no historian, but as far as I know writing is only about 5-6k years old… humans quite a bit older….)

The risk of loss of control is not to AI but to the ultra-wealthy, such as Musk and other signatories, who are calling for this pause.

Regulating this technology is a great idea, which I guess is why Gary Marcus signed the petition. But you can already see the libertarian techbro angle in this from the fact that it’s initially proposed as a voluntary detente by big tech firms, not something that democratic governance should address.

Kenny Easwaran
Reply to  Mark Alfano
1 year ago

I’m not sure how you get “not something that democratic governance should address” from “If such a pause cannot be enacted quickly, governments should step in and institute a moratorium.”

Mark Alfano
Reply to  Kenny Easwaran
1 year ago

That’s why I said “initially”… 🙂