4h ago
Software Engineer, Site Reliability (SRE)
San Francisco, CA
$230k-$390k / year
full-timeseniorsoftware
🛠 Tech Stack
💼 About This Role
You'll join Sierra's Site Reliability team to define and build the foundation of reliability, observability, and scalability across our AI-driven infrastructure. You'll partner with core engineering and product teams to ensure high availability and efficiency. This role offers the chance to own the SRE practices at an early-stage AI startup with top-tier founders.
🎯 What You'll Do
- Own observability stack (monitoring, alerting, logging, tracing)
- Design reliable and scalable cloud infrastructure on AWS using Terraform
- Improve reliability and scalability of LLM deployments
- Lead improvements to CI/CD pipelines and incident management
📋 Requirements
- 5+ years of hands-on Site Reliability or infrastructure engineering
- Deep experience with Terraform, AWS services, and container orchestration
- Strong background in observability systems (e.g., Prometheus, Grafana, Datadog)
- Experience designing for availability and scalability at infrastructure and application layers
✨ Nice to Have
- Experience with LLM infrastructure (inference optimization, model deployment)
- Past experience in an early-stage startup defining SRE culture
- Familiarity with incident management automation
🎁 Benefits & Perks
- 🏖️ Flexible PTO
- 🏥 Medical, Dental, Vision for you and family
- 💰 Retirement Plan with Sierra match
- 👶 Parental Leave and fertility benefits
- 🍽️ Lunch, snacks, and coffee
0 0 0