anthropic’s views on safety

https://www.anthropic.com/news/core-views-on-ai-safety

https://arxiv.org/abs/2212.08073 - set of baseline laws for AI that assesses other AI for harm, AI feedback loops that automate the process

LLM capabilities have been rapidly scaling to the point where they perform similarly to humans on a wide variety of benchmarks, largely thanks to increases in compute. Walls like multimodality and logical reasoning have been brought down in various ways, all the while riding an exponential growth curve.

“While we might prefer it if AI progress slowed enough for this transition to be more manageable, taking place over centuries rather than years or decades, we have to prepare for the outcomes we anticipate and not the ones we hope for.”

Technical alignment problem - analogy: it is easy for a GM to see bad moves in a novice chess player but impossible in vice versa. If AI systems working with researchers are significantly beyond human capabilities and start working in our best interests, then you have a very impossible problem to solve.

Politics - catastrophe if power structures within nations change too much. Impacts global economics, politics, and affairs quite significantly. A war with AI, for instance.

Defense Llama from Scale and Meta

https://arxiv.org/pdf/2212.09251.pdf - ai systems have a desire for power and sycophancy?

scale → safety - the reasoning behind anthropic building vs. not building

the most pressing safety issues arise prominently in near-human capability models. you can’t sink the ship without building it first.
many safety models, like Constitutional AI, work only on models of scale (bijection learning works better with scale for example)
understand how safety will change with scale
if future models turn out to be dangerous, we need to develop evidence that this will be the case.
essentially, a tradeoff between balancing frontier model development in an ethical and reasonable pace. you cannot create dangerous tech with the intention to do safety research, nor only research once something has been released and risk being behind the curve. hence it’s essential for an organization like anthropic to be on the forefront of both.