3h ago
Senior Software Engineer II, Developer Experience / Operational Excellence
Remote - UK
full-timesenior RemoteInternet of Things
Description
In this role, you will design and build automated reliability and self-healing systems that protect production at scale. You will own and improve incident management tooling, reduce alert noise, and empower engineering teams to operate confidently. You will also develop observability infrastructure and contribute to AI-driven operational tooling.
Requirements
- Experience building automated safeguards and platform tooling for reliability
- Strong knowledge of observability and monitoring systems
- Experience with incident management and on-call processes
- Ability to partner with engineering teams to improve operational posture
- Empathy for on-call engineers and bias toward reducing toil
Responsibilities
- Design and build automated reliability and self-healing systems for production
- Own and improve incident management tooling and on-call health
- Develop and evolve observability infrastructure (monitoring, alerting, SLOs)
- Contribute to AI-driven operational tooling for autonomous remediation
- Partner with product engineering teams to diagnose reliability gaps and reduce operational burden
0 views 0 saves 0 applications