4h ago

Software Engineer, Site Reliability (SRE)

San Francisco, CA

$230k-$390k / year

full-timeseniorsoftware

🛠 Tech Stack

💼 About This Role

You'll join Sierra's Site Reliability team to define and build the foundation of reliability, observability, and scalability across our AI-driven infrastructure. You'll partner with core engineering and product teams to ensure high availability and efficiency. This role offers the chance to own the SRE practices at an early-stage AI startup with top-tier founders.

🎯 What You'll Do

  • Own observability stack (monitoring, alerting, logging, tracing)
  • Design reliable and scalable cloud infrastructure on AWS using Terraform
  • Improve reliability and scalability of LLM deployments
  • Lead improvements to CI/CD pipelines and incident management

📋 Requirements

  • 5+ years of hands-on Site Reliability or infrastructure engineering
  • Deep experience with Terraform, AWS services, and container orchestration
  • Strong background in observability systems (e.g., Prometheus, Grafana, Datadog)
  • Experience designing for availability and scalability at infrastructure and application layers

✨ Nice to Have

  • Experience with LLM infrastructure (inference optimization, model deployment)
  • Past experience in an early-stage startup defining SRE culture
  • Familiarity with incident management automation

🎁 Benefits & Perks

  • 🏖️ Flexible PTO
  • 🏥 Medical, Dental, Vision for you and family
  • 💰 Retirement Plan with Sierra match
  • 👶 Parental Leave and fertility benefits
  • 🍽️ Lunch, snacks, and coffee
0 0 0