2h ago

Member of Technical Staff - Safety Lead

San Francisco

$200k-$300k / yearest.

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll own the red-teaming and adversarial evaluation pipeline for Reflection's open-weight AI models, probing for failure modes across security, misuse, and alignment gaps. You'll work hand-in-hand with the Alignment team to translate safety findings into concrete guardrails for every release. This role is a critical gatekeeper for open-weight model shipping.

🎯 What You'll Do

  • Own the red-teaming and adversarial evaluation pipeline for models.
  • Translate safety findings into concrete guardrails with Alignment team.
  • Validate that every release meets risk thresholds before shipping.
  • Develop scalable, automated safety benchmarks and adversarial tests.

📋 Requirements

  • Graduate degree (MS/PhD) in CS, ML, or equivalent in AI Safety.
  • Deep technical understanding of LLM safety and adversarial attacks.
  • Strong software engineering for automated evaluation pipelines.
  • Experience with RLHF/RLAIF preferred.

✨ Nice to Have

  • Experience with reinforcement learning (RLHF/RLAIF).
  • Experience building large-scale ML systems.

🎁 Benefits & Perks

  • 💰 Top-tier compensation with salary and equity.
  • 🏥 Comprehensive health & wellness including medical, dental, vision.
  • 👶 Fully paid parental leave and family planning support.
  • 🏖️ Paid time off and relocation support.
  • 🍽️ Daily lunch and dinner provided, plus off-sites.
0 0 0