2d ago
Senior Site Reliability Engineer
Dublin, CA
✨ $160k-$220k / yearest.
full-timeseniorai-ml
🛠 Tech Stack
+1
💼 About This Role
You'll architect and maintain scalable infrastructure for our GenAI SaaS platform, ensuring reliability and performance. You'll define SLOs and SLIs and lead incident response to minimize downtime. This role bridges development and operations for cutting-edge AI workloads.
🎯 What You'll Do
- Architect scalable, highly available infrastructure for GenAI platform
- Design monitoring, alerting, and observability solutions
- Automate deployment, scaling, and management of cloud-native infrastructure
- Participate in on-call rotations and respond to production incidents
📋 Requirements
- 8+ years in DevOps or SRE roles
- Cloud platforms experience (AWS, GCP, or Azure)
- Infrastructure as code tools (Terraform, CloudFormation)
- Containerization technologies (Docker, Kubernetes)
✨ Nice to Have
- Experience supporting AI/ML systems in production
- Knowledge of GPU infrastructure management
- Familiarity with distributed systems and high-performance computing
0 0 0