Senior Research Scientist, Reward Models at Jobs at Anthropic

3h ago

Senior Research Scientist, Reward Models

San Francisco, CA

$350,000-$500,000 / year

full-timesenior HybridArtificial Intelligence

Tech Stack

Description

As a Senior Research Scientist on our Reward Models team, you will lead research to improve how we specify and learn human preferences at scale, directly shaping how Claude understands and optimizes for human values. You will develop novel architectures and training methods for RLHF, research LLM-based evaluation techniques, and investigate reward hacking mitigation, collaborating with teams across Anthropic to translate insights into production improvements.

Requirements

Track record of research contributions in reward modeling, RLHF, or related ML areas
Experience training and evaluating reward models for large language models
Comfortable designing and running large-scale experiments with significant computational resources
Ability to work across research and engineering with scientific rigor
Strong communication skills and collaborative mindset

Responsibilities

Lead research on reward model architectures and RLHF training approaches
Develop and evaluate LLM-based grading methods including rubric-driven approaches
Research techniques to detect and mitigate reward hacking
Design experiments to understand reward model generalization and failure modes
Collaborate with Finetuning team to translate research into production pipeline improvements

Jobs at Anthropic

Other jobs at Jobs at Anthropic

No other jobs found.

0 views 0 saves 0 applications