20h ago
Senior Staff Cloud Backend Engineer - Observability and Site Reliability
Bengaluru
โจ $250k-$350k / yearest.
full-timelead Hybride-commerce
๐ Tech Stack
+1
๐ผ About This Role
You'll design, build, and operate scalable observability and reliability solutions for large-scale datacenter infrastructure at Coupang, a leading e-commerce company. Your work will ensure system reliability and operational excellence through automation and SRE best practices. This role offers the chance to drive improvements across the full service lifecycle in a fast-growing, high-impact environment.
๐ฏ What You'll Do
- Design and maintain observability solutions for datacenter infrastructure.
- Develop and operate large-scale telemetry platforms with real-time monitoring.
- Apply SRE principles to improve reliability and operational efficiency.
- Lead root cause analysis and post-incident reviews.
- Automate infrastructure provisioning and system management.
๐ Requirements
- 12+ years of software engineering experience with distributed systems.
- Strong proficiency in Go or Python.
- Expert-level knowledge of Kubernetes internals and containerization.
- Proficiency in observability tools like Prometheus, Grafana, ELK Stack.
โจ Nice to Have
- Experience building infrastructure for LLM inference or training clusters.
- Familiarity with mixed precision or custom hardware accelerators.
- Experience managing hybrid-cloud or multi-AZ deployments.
๐ Benefits & Perks
- ๐๏ธ Flexible Hybrid Work Model - 3 days in office per week.
- ๐ฐ Competitive Compensation - Including stock options.
- ๐ Growth Opportunities - Work at a rapidly scaling global company.
- ๐ฅ Health Insurance - Comprehensive coverage.
- ๐ฝ๏ธ Meal Allowances - On-campus dining options.
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Screenยท 30 min
- 2Technical Phone Interviewยท 60 min
- 3On-site Interviews (4-5 rounds)ยท 4-5 hours
0 0 0