3h ago
Senior Software Engineer II, Developer Experience, Operational Excellence
Remote - US or Canada (Eastern Time Zone)
full-timesenior RemoteIoT / Connected Operations
Tech Stack
Description
You will design and build automated reliability and self-healing systems that protect production at scale, including automated rollbacks, deploy safeguards, and fault mitigation. You'll develop observability infrastructure and contribute to AI-driven operational tooling that goes beyond triage. Partner with product engineering teams to strengthen operational posture and champion best practices across the organization.
Requirements
- 8+ years of software engineering experience
- Bachelor's Degree in Computer Science/Engineering or equivalent
- 3+ years in infrastructure/platform engineering
- Expertise in observability, reliability, operational metrics, and data analysis
- Experience with Datadog or equivalent observability tooling
Responsibilities
- Design and build automated reliability and self-healing systems at scale
- Own and improve incident management tooling and on-call health
- Develop and evolve observability infrastructure including monitoring, alerting, SLOs
- Contribute to AI-driven operational tooling for autonomous remediation
- Drive incident prevention by eliminating operational toil
0 views 0 saves 0 applications