3h ago

Senior Software Engineer II, Developer Experience, Operational Excellence

Remote - US or Canada (Eastern Time Zone)
full-timesenior RemoteIoT / Connected Operations

Tech Stack

Description

You will design and build automated reliability and self-healing systems that protect production at scale, including automated rollbacks, deploy safeguards, and fault mitigation. You'll develop observability infrastructure and contribute to AI-driven operational tooling that goes beyond triage. Partner with product engineering teams to strengthen operational posture and champion best practices across the organization.

Requirements

  • 8+ years of software engineering experience
  • Bachelor's Degree in Computer Science/Engineering or equivalent
  • 3+ years in infrastructure/platform engineering
  • Expertise in observability, reliability, operational metrics, and data analysis
  • Experience with Datadog or equivalent observability tooling

Responsibilities

  • Design and build automated reliability and self-healing systems at scale
  • Own and improve incident management tooling and on-call health
  • Develop and evolve observability infrastructure including monitoring, alerting, SLOs
  • Contribute to AI-driven operational tooling for autonomous remediation
  • Drive incident prevention by eliminating operational toil
0 views 0 saves 0 applications