6h ago
Incident Engineer
Gurugram
✨ $20k-$35k / yearest.
full-timemid RemoteArtificial Intelligence
🛠 Tech Stack
💼 About This Role
You'll own the full incident lifecycle for an agentic AI platform serving global brands. You'll act as central command during major incidents, driving root cause analysis and preventive actions while improving observability and incident tooling.
🎯 What You'll Do
- Own incident lifecycle: detection, triage, escalation, resolution, postmortems
- Act as central command during major incidents and war rooms
- Define and enforce SLAs/SLOs, severity frameworks, and runbooks
- Monitor system health across integrations and pipelines
📋 Requirements
- 3–6 years in Incident Management / SRE / Production Support
- Strong understanding of distributed systems, APIs, and cloud environments (AWS)
- Experience with observability tools like DataDog
- Familiarity with AI/ML systems, especially LLM integrations
✨ Nice to Have
- Exposure to OpenAI or similar LLM platforms
- Experience supporting customer-facing SaaS products
- Automation mindset with runbooks and alert tuning
🎁 Benefits & Perks
- 🏖️ Unlimited PTO
- 🏥 Health insurance
- 💻 Remote work
0 0 0