The Modern Abort: Anthropics Constitutional AI
Les Chambers
les at chambers.com.au
Sat Jul 22 05:44:37 CEST 2023
Hi
In a previous post I alluded to a control system Safety function, where abort
software observed control software with the role of detecting malevolent
behaviour and bringing the system under control to a safe state. That was the
1970s and 80s. Fast forward to 2030 and I find the current incarnation of this
strategy in what is termed Constitutional AI.
The term was coined by Anthropic, a safety-oriented AI start-up that split from
OpenAI due to safety concerns. They claim their chatbot (large language model)
Claude implements this constitution.
A brief tutorial can be found on a Hard Fork podcast where Anthropics CEO
Dario Amodei is interviewed.
Refer: bit.ly/Anthropic
A brief summary in Darios words follows:
The Constitutional AI Method
We write a document that we call the Constitution. Then we tell the model,
Well, you're going to act in line with the Constitution. We have one copy of
the model act in line with the Constitution, and then another copy of the model
looks at the Constitution, looks at the task and the response. For example, if
the Constitution says be politically neutral and the response is, "I love
Donald Trump. The second model should say You are expressing a preference for
a candidate. You should be politically neutral. The AI grades the response and
takes the place of what the human contractors used to do with reinforcement
learning. In the end, if it works well, we get something in line with the
constitutional principles.
The principles have been published. We shouldn't do this by fiat. We should
come up with something that most people can agree on. For example basic
concepts of human rights. Democratic participation. We need to develop the
Constitution through some formal process.
Perceived Risk: jailbreak - as the models get more powerful in two or three
years, they can do dangerous things with science, engineering and biology. And
then a jailbreak could be life or death. We are making progress, but the stakes
are getting higher. We need to make sure the first one {the constitution
monitor AI} wins over the second.
end
Anthropic's Constitution is a list of 58 principles built on sources including
the United Nations Universal Declaration of Human Rights, Apples terms of
service, rules developed by Google and Anthropics own research. I have not
been able to find a copy on the web I have included some links below that give
background.
Conclusion
The current state of play in AI development smacks of Deep Water Horizon. BP
developed the technology to place a blowout preventer at a depth of 5000 feet
but neglected to develop the technology to prevent a massive oil leak if it
failed as it did. Anthropic can be commended for seeking a solution to an AI
blowout. Given no AI pause will ever occur, we seem to be in a race with a
horizon of roughly 2 years according to Dario.
Question: What should be paused? What form should an AI pause take?
Answer: The scaling trend. As the datasets get larger I am concerned that very
grave misuse of the models will happen. Catastrophic things could happen in
areas like biology within two years. {Darios words}
On the other side of the argument, we have the accelerationists - the loyal
backlash to the prophets of AI doom. The backlash to the safety culture at
Anthropic. The movement is called Affective Accelerationism. Put the stuff
out there in the world because it's going to improve lives and whatever
problems there are we can iron out over time (good luck with that!). The mantra
is, Innovation is generally driven by people iterating fast. Open source helps
with all these things. Companies like Meta are open-sourcing their language
models, and throwing them out into the world. But beware the influential
accelerationists who are often venture capitalists with financial interests at
stake.
Well, there you have it. Exciting times.
Cheers
Les
Links:
Podcast: Hard Fork (on Spotify) Dario Amodei, CEO of Anthropic on the Paradoxes
of AI Safety
URL: https://bit.ly/Anthropic
Large Language Model Claudes Constitution
URL: https://bit.ly/ClaudesConstitution
Anthropic Paper: Constitutional AI: Harmlessness from AI Feedback (PDF)
URL: https://bit.ly/Paper-ConstitutionalAI
--
Les Chambers
les at chambers.com.au
+61 (0)412 648 992
More information about the systemsafety
mailing list