3h ago

Senior Software Engineer II, Developer Experience / Operational Excellence

Remote - UK
full-timesenior RemoteInternet of Things

Description

In this role, you will design and build automated reliability and self-healing systems that protect production at scale. You will own and improve incident management tooling, reduce alert noise, and empower engineering teams to operate confidently. You will also develop observability infrastructure and contribute to AI-driven operational tooling.

Requirements

  • Experience building automated safeguards and platform tooling for reliability
  • Strong knowledge of observability and monitoring systems
  • Experience with incident management and on-call processes
  • Ability to partner with engineering teams to improve operational posture
  • Empathy for on-call engineers and bias toward reducing toil

Responsibilities

  • Design and build automated reliability and self-healing systems for production
  • Own and improve incident management tooling and on-call health
  • Develop and evolve observability infrastructure (monitoring, alerting, SLOs)
  • Contribute to AI-driven operational tooling for autonomous remediation
  • Partner with product engineering teams to diagnose reliability gaps and reduce operational burden
0 views 0 saves 0 applications