14h ago

Research Engineer, Safeguards Labs

San Francisco, CA | New York City, NY

$350k-$850k / year

full-timeseniorai-ml Visa Sponsor

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll define and execute the Labs research agenda at Anthropic, prototyping novel safety methods for Claude. Your work will directly protect users by detecting misuse, strengthening model safeguards, and transferring prototypes into production. This role offers substantial latitude in a small, high-leverage team.

๐ŸŽฏ What You'll Do

  • Lead and contribute to research projects on detecting misuse of Claude.
  • Design offline analyses over model usage data to surface abuse patterns.
  • Build classifiers, detection systems, and evaluate their effectiveness.
  • Partner with engineers on tech transfer of prototypes to production.

๐Ÿ“‹ Requirements

  • Track record of independently driving research projects from ambiguous problems to results.
  • Proficient in Python and comfortable with large datasets.
  • Working familiarity with large language models (sampling, prompting, training).
  • Ability to scope own work and switch between research, engineering, analysis.

โœจ Nice to Have

  • Experience building ML models for abuse, fraud, or security applications.
  • Knowledge of evaluation methodologies for language models and evals design.
  • Background in trust and safety, integrity, threat intelligence, or adversarial ML.

๐ŸŽ Benefits & Perks

  • ๐Ÿ’ฐ Annual compensation: $350Kโ€“$850K USD
  • ๐ŸŒ Visa sponsorship offered and supported.
  • ๐Ÿข Hybrid policy: in-office at least 25% of the time.
  • ๐Ÿง  Work on cutting-edge AI safety with a top research team.

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks ยท AI estimate

  1. 1Phone Screenยท 30 min
  2. 2Technical Interviewยท 60 min
  3. 3On-site / Final Roundยท Half day

๐Ÿšฉ Heads Up

  • Wide salary range ($350Kโ€“$850K) may indicate role level ambiguity.
  • Visa sponsorship is not guaranteed for every candidate.
0 0 0