Safety Evaluation of Claude Opus 4 and Claude Sonnet 4 preview page 1

Anthropic

Safety Evaluation of Claude Opus 4 and Claude Sonnet 4

Pages

123

Time to read

168 mins

Language

English

Summary

This technical report presents the safety evaluation of Claude Opus 4 and Claude Sonnet 4, two hybrid reasoning large language models developed by Anthropic. The document outlines the pre-deployment safety tests conducted according to the Responsible Scaling Policy, which includes assessments of model behavior in relation to the Usage Policy and evaluations of risks such as reward hacking. It details the alignment assessment that identifies various misalignment risks and includes a model welfare assessment. The report explains the decision to deploy Claude Opus 4 under the AI Safety Level 3 Standard and Claude Sonnet 4 under the AI Safety Level 2 Standard. Additionally, it describes the model training process, the characteristics of the models, and the iterative evaluations that informed the release decision. The report emphasizes the importance of safety measures and ongoing monitoring to ensure responsible AI deployment.

Anthropic

Safety Evaluation of Claude Opus 4 and Claude Sonnet 4

Summary

Get the Full Copy