Research Engineer, Safeguards Labs at Jobs at Anthropic

14h ago

Research Engineer, Safeguards Labs

San Francisco, CA | New York City, NY

$350k-$850k / year

full-timeseniorai-ml Visa Sponsor

🛠 Tech Stack

💼 About This Role

You'll define and execute the Labs research agenda at Anthropic, prototyping novel safety methods for Claude. Your work will directly protect users by detecting misuse, strengthening model safeguards, and transferring prototypes into production. This role offers substantial latitude in a small, high-leverage team.

🎯 What You'll Do

Lead and contribute to research projects on detecting misuse of Claude.
Design offline analyses over model usage data to surface abuse patterns.
Build classifiers, detection systems, and evaluate their effectiveness.
Partner with engineers on tech transfer of prototypes to production.

📋 Requirements

Track record of independently driving research projects from ambiguous problems to results.
Proficient in Python and comfortable with large datasets.
Working familiarity with large language models (sampling, prompting, training).
Ability to scope own work and switch between research, engineering, analysis.

✨ Nice to Have

Experience building ML models for abuse, fraud, or security applications.
Knowledge of evaluation methodologies for language models and evals design.
Background in trust and safety, integrity, threat intelligence, or adversarial ML.

🎁 Benefits & Perks

💰 Annual compensation: $350K–$850K USD
🌍 Visa sponsorship offered and supported.
🏢 Hybrid policy: in-office at least 25% of the time.
🧠 Work on cutting-edge AI safety with a top research team.

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

1Phone Screen· 30 min
2Technical Interview· 60 min
3On-site / Final Round· Half day

🚩 Heads Up

Wide salary range ($350K–$850K) may indicate role level ambiguity.
Visa sponsorship is not guaranteed for every candidate.

Jobs at Anthropic

Other jobs at Jobs at Anthropic

No other jobs found.

0 0 0