3h ago

Senior Research Scientist, Reward Models

San Francisco, CA

$350,000-$500,000 / year

full-timesenior HybridArtificial Intelligence

Tech Stack

Description

As a Senior Research Scientist on our Reward Models team, you will lead research to improve how we specify and learn human preferences at scale, directly shaping how Claude understands and optimizes for human values. You will develop novel architectures and training methods for RLHF, research LLM-based evaluation techniques, and investigate reward hacking mitigation, collaborating with teams across Anthropic to translate insights into production improvements.

Requirements

  • Track record of research contributions in reward modeling, RLHF, or related ML areas
  • Experience training and evaluating reward models for large language models
  • Comfortable designing and running large-scale experiments with significant computational resources
  • Ability to work across research and engineering with scientific rigor
  • Strong communication skills and collaborative mindset

Responsibilities

  • Lead research on reward model architectures and RLHF training approaches
  • Develop and evaluate LLM-based grading methods including rubric-driven approaches
  • Research techniques to detect and mitigate reward hacking
  • Design experiments to understand reward model generalization and failure modes
  • Collaborate with Finetuning team to translate research into production pipeline improvements
0 views 0 saves 0 applications