19h ago

Site Reliability Engineer

San Francisco Bay Area

$180k-$270k / yearest.

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll own production reliability and platform engineering for Devin and Windsurf, used by hundreds of thousands of developers daily. You'll build monitoring, incident response, and CI/CD systems to keep products running smoothly. This role combines SRE and platform engineering in a small, elite AI lab.

🎯 What You'll Do

  • Define and own SLOs, SLIs, and error budgets for Devin and Windsurf
  • Lead incident response and run blameless postmortems
  • Own deployment pipelines, release infrastructure, and developer tooling
  • Manage cloud infrastructure as code and ensure reproducible environments

📋 Requirements

  • Deep experience running production systems at scale with SLOs and on-call
  • Strong software engineering fundamentals and ability to write real code
  • Proficiency with cloud infrastructure (AWS, GCP, or Azure) and Kubernetes
  • Experience building and owning CI/CD pipelines for fast-moving teams

✨ Nice to Have

  • Experience with developer-facing products or platforms
  • Familiarity with incident command and blameless postmortem culture
  • Product empathy to understand reliability from a user's perspective

🎁 Benefits & Perks

  • 🏖️ Unlimited PTO
  • 🏥 Comprehensive health insurance
  • 🍽️ Free meals and snacks
0 0 0