7h ago

Researcher, Automated Red Teaming

San Francisco

$295k-$445k / year

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll own the research and technical direction for automated red teaming across catastrophic risk areas, such as classifier jailbreak discovery and bio threat elicitation. You'll partner closely with vertical risk teams and the Classifiers team to turn attacks into training data and robustness gains. This role directly reduces real-world catastrophic risk from frontier AI models.

🎯 What You'll Do

  • Set research direction for automated red teaming across catastrophic risk areas.
  • Build scalable systems for continuous discovery of model failure modes.
  • Partner with risk teams to define threat models and prioritize mitigations.
  • Translate attacks into training data and robustness improvements.

📋 Requirements

  • PhD or equivalent experience in AI safety, adversarial ML, or related field.
  • Hands-on experience with LLMs and agentic systems, including multi-turn behaviors and tool use.
  • Strong software engineering skills, able to build production-adjacent pipelines.

✨ Nice to Have

  • Experience in adversarial ML, security research, or abuse prevention.
  • Experience building large-scale eval infrastructure.

🎁 Benefits & Perks

  • 🏖️ Flexible PTO
  • 🧑‍💻 Remote work options available
  • 🏥 Comprehensive health insurance
  • 💰 Competitive salary and equity
  • 🎓 Learning and development stipend
0 0 0