19h ago

Senior Staff Cloud Backend Engineer - Observability and Site Reliability

Bengaluru

โœจ $250k-$400k / yearest.

full-timeleadsoftware

๐Ÿ›  Tech Stack

+1

๐Ÿ’ผ About This Role

You'll design, build, and operate scalable observability and reliability solutions for large-scale datacenter infrastructure. You'll develop high-performance monitoring and telemetry platforms, ensuring system reliability and driving operational excellence through automation and SRE best practices.

๐ŸŽฏ What You'll Do

  • Design and maintain observability solutions for datacenter infrastructure
  • Develop and operate large-scale observability and telemetry platforms
  • Automate infrastructure provisioning, monitoring, and system management
  • Lead root cause analysis and post-incident reviews

๐Ÿ“‹ Requirements

  • 12+ years of progressive software engineering experience
  • Strong proficiency in Go or Python
  • Expert-level knowledge of Kubernetes internals and containerization
  • Proficiency in Prometheus, Grafana, or ELK Stack

โœจ Nice to Have

  • Experience building infrastructure for LLM inference or large-scale training
  • Familiarity with mixed precision or custom hardware accelerators
  • Experience managing hybrid-cloud or multi-AZ deployments

๐ŸŽ Benefits & Perks

  • ๐Ÿ–๏ธ Unlimited PTO
  • ๐Ÿฅ Health insurance
  • ๐Ÿ’ฐ Equity
  • ๐Ÿ“ˆ Career growth
  • ๐Ÿ’ป Remote work options

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks ยท AI estimate

  1. 1Recruiter callยท 30 min
  2. 2Technical screenยท 60 min
  3. 3On-site interviewsยท 4 hours
0 0 0