19h ago
Site Reliability Engineer
San Francisco Bay Area
✨ $180k-$270k / yearest.
full-timeseniorai-ml
🛠 Tech Stack
💼 About This Role
You'll own production reliability and platform engineering for Devin and Windsurf, used by hundreds of thousands of developers daily. You'll build monitoring, incident response, and CI/CD systems to keep products running smoothly. This role combines SRE and platform engineering in a small, elite AI lab.
🎯 What You'll Do
- Define and own SLOs, SLIs, and error budgets for Devin and Windsurf
- Lead incident response and run blameless postmortems
- Own deployment pipelines, release infrastructure, and developer tooling
- Manage cloud infrastructure as code and ensure reproducible environments
📋 Requirements
- Deep experience running production systems at scale with SLOs and on-call
- Strong software engineering fundamentals and ability to write real code
- Proficiency with cloud infrastructure (AWS, GCP, or Azure) and Kubernetes
- Experience building and owning CI/CD pipelines for fast-moving teams
✨ Nice to Have
- Experience with developer-facing products or platforms
- Familiarity with incident command and blameless postmortem culture
- Product empathy to understand reliability from a user's perspective
🎁 Benefits & Perks
- 🏖️ Unlimited PTO
- 🏥 Comprehensive health insurance
- 🍽️ Free meals and snacks
0 0 0