https://www.anthropic.com/news/core-views-on-ai-safety

https://arxiv.org/abs/2212.08073 - set of baseline laws for AI that assesses other AI for harm, AI feedback loops that automate the process

LLM capabilities have been rapidly scaling to the point where they perform similarly to humans on a wide variety of benchmarks, largely thanks to increases in compute. Walls like multimodality and logical reasoning have been brought down in various ways, all the while riding an exponential growth curve.

“While we might prefer it if AI progress slowed enough for this transition to be more manageable, taking place over centuries rather than years or decades, we have to prepare for the outcomes we anticipate and not the ones we hope for.”

Technical alignment problem - analogy: it is easy for a GM to see bad moves in a novice chess player but impossible in vice versa. If AI systems working with researchers are significantly beyond human capabilities and start working in our best interests, then you have a very impossible problem to solve.

Politics - catastrophe if power structures within nations change too much. Impacts global economics, politics, and affairs quite significantly. A war with AI, for instance.

https://arxiv.org/pdf/2212.09251.pdf - ai systems have a desire for power and sycophancy?

scale → safety - the reasoning behind anthropic building vs. not building