18h ago
Staff Site Reliability Engineer
New York City, NY
$157.7k-$277.8k / year
full-timesenior Hybridai-ml
๐ Tech Stack
+2
๐ผ About This Role
You'll build resilient systems and automate infrastructure for an enterprise AI platform handling high-traffic environments. Your work will directly ensure 24/7 availability for customers like Marriott and Uber. This role offers the chance to shape reliability practices at a rapidly growing unicorn.
๐ฏ What You'll Do
- Build AI-native approaches for infrastructure management using Python or Go
- Design scalable, fault-tolerant infrastructure on AWS, GCP, or Azure
- Own reliability, performance, and efficiency of core services with SLOs
- Lead incident response, post-mortems, and root cause analyses
๐ Requirements
- 7+ years experience in SRE or similar role
- Deep expertise with AWS, Docker, Kubernetes, and Terraform
- Strong proficiency in Python, Java, or Go for automation
- Knowledge of Prometheus, Grafana, or ELK Stack
โจ Nice to Have
- Experience with AI/ML platforms
- Experience with high-traffic production systems
๐ Benefits & Perks
- ๐๏ธ Generous PTO plus company holidays
- ๐ฅ Medical, dental, and vision coverage for you and family
- ๐ถ 12 weeks paid parental leave for all parents
- ๐ช Wellness stipend for gym, massage, etc.
- ๐ Learning and development stipend
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Phone Screenยท 30 min
- 2Technical Interviewยท 60 min
- 3System Design Interviewยท 60 min
0 0 0