Tech News

Anthropic looks to fund a new, more comprehensive generation of AI benchmarks

Published

11 months ago

July 2, 2024

Anthropic has announced the launch of a program aimed at funding the development of new benchmarks to assess the performance and impact of AI models, including generative models like Claude.

The program will provide payments to third-party organizations capable of measuring advanced capabilities in AI models. Interested parties can submit applications for evaluation on an ongoing basis.

Anthropic’s goal is to improve AI safety by creating high-quality evaluations that address the evolving demands of the field. The company acknowledges the current limitations of existing benchmarks and aims to develop new tools and methods to better assess AI models.

The proposed benchmarks will focus on AI security and societal implications, including tasks like cyberattacks, weapon enhancement, and manipulation of information. Anthropic also plans to support research into AI’s potential for scientific study, language translation, bias mitigation, and toxicity filtering.

To achieve these goals, Anthropic envisions new platforms for experts to develop evaluations and large-scale trials involving thousands of users. The company has hired a full-time coordinator for the program and is open to acquiring or expanding projects with scalability potential.

Funding options will be tailored to the needs of each project, with teams having access to Anthropic’s domain experts for guidance and support. However, some in the AI community may have concerns about aligning with Anthropic’s safety classifications and commercial interests.

While Anthropic’s initiative to support new AI benchmarks is commendable, questions remain about potential biases and conflicts of interest. The company’s emphasis on catastrophic AI risks may not align with the views of all experts in the field.

Anthropic hopes its program will drive progress towards comprehensive AI evaluation becoming an industry standard. The collaboration with independent efforts to improve AI benchmarks will be crucial in achieving this goal.