19h ago
Senior Staff Cloud Backend Engineer - Observability and Site Reliability
Bengaluru
โจ $250k-$400k / yearest.
full-timeleadsoftware
๐ Tech Stack
+1
๐ผ About This Role
You'll design, build, and operate scalable observability and reliability solutions for large-scale datacenter infrastructure. You'll develop high-performance monitoring and telemetry platforms, ensuring system reliability and driving operational excellence through automation and SRE best practices.
๐ฏ What You'll Do
- Design and maintain observability solutions for datacenter infrastructure
- Develop and operate large-scale observability and telemetry platforms
- Automate infrastructure provisioning, monitoring, and system management
- Lead root cause analysis and post-incident reviews
๐ Requirements
- 12+ years of progressive software engineering experience
- Strong proficiency in Go or Python
- Expert-level knowledge of Kubernetes internals and containerization
- Proficiency in Prometheus, Grafana, or ELK Stack
โจ Nice to Have
- Experience building infrastructure for LLM inference or large-scale training
- Familiarity with mixed precision or custom hardware accelerators
- Experience managing hybrid-cloud or multi-AZ deployments
๐ Benefits & Perks
- ๐๏ธ Unlimited PTO
- ๐ฅ Health insurance
- ๐ฐ Equity
- ๐ Career growth
- ๐ป Remote work options
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter callยท 30 min
- 2Technical screenยท 60 min
- 3On-site interviewsยท 4 hours
0 0 0