Anthropic
Safety Evaluation of Claude Opus 4 and Claude Sonnet 4
Pages
123
Time to read
168 mins
Language
English
Pages
123
Time to read
168 mins
Language
English
This technical report presents the safety evaluation of Claude Opus 4 and Claude Sonnet 4, two hybrid reasoning large language models developed by Anthropic. The document outlines the pre-deployment safety tests conducted according to the Responsible Scaling Policy, which includes assessments of model behavior in relation to the Usage Policy and evaluations of risks such as reward hacking. It details the alignment assessment that identifies various misalignment risks and includes a model welfare assessment. The report explains the decision to deploy Claude Opus 4 under the AI Safety Level 3 Standard and Claude Sonnet 4 under the AI Safety Level 2 Standard. Additionally, it describes the model training process, the characteristics of the models, and the iterative evaluations that informed the release decision. The report emphasizes the importance of safety measures and ongoing monitoring to ensure responsible AI deployment.