14h ago
Research Engineer, Safeguards Labs
San Francisco, CA | New York City, NY
$350k-$850k / year
full-timeseniorai-ml Visa Sponsor
๐ Tech Stack
๐ผ About This Role
You'll define and execute the Labs research agenda at Anthropic, prototyping novel safety methods for Claude. Your work will directly protect users by detecting misuse, strengthening model safeguards, and transferring prototypes into production. This role offers substantial latitude in a small, high-leverage team.
๐ฏ What You'll Do
- Lead and contribute to research projects on detecting misuse of Claude.
- Design offline analyses over model usage data to surface abuse patterns.
- Build classifiers, detection systems, and evaluate their effectiveness.
- Partner with engineers on tech transfer of prototypes to production.
๐ Requirements
- Track record of independently driving research projects from ambiguous problems to results.
- Proficient in Python and comfortable with large datasets.
- Working familiarity with large language models (sampling, prompting, training).
- Ability to scope own work and switch between research, engineering, analysis.
โจ Nice to Have
- Experience building ML models for abuse, fraud, or security applications.
- Knowledge of evaluation methodologies for language models and evals design.
- Background in trust and safety, integrity, threat intelligence, or adversarial ML.
๐ Benefits & Perks
- ๐ฐ Annual compensation: $350Kโ$850K USD
- ๐ Visa sponsorship offered and supported.
- ๐ข Hybrid policy: in-office at least 25% of the time.
- ๐ง Work on cutting-edge AI safety with a top research team.
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Phone Screenยท 30 min
- 2Technical Interviewยท 60 min
- 3On-site / Final Roundยท Half day
๐ฉ Heads Up
- Wide salary range ($350Kโ$850K) may indicate role level ambiguity.
- Visa sponsorship is not guaranteed for every candidate.
0 0 0