3h ago

Senior Site Reliability Engineer

Toronto, Ontario, Canada
full-timesenior Hybridincident management

Tech Stack

Description

As an early SRE leader at Rootly, you will own the technical foundation, embedding with product teams to enhance observability, reliability, and performance. You'll build automation, define SLOs, and drive scaling and capacity planning efforts for a high-growth incident management platform.

Requirements

  • 5+ years experience in an SRE, Platform, or Infrastructure Engineering role
  • 5+ years experience writing software in a production environment
  • Strong technical knowledge of cloud infrastructure, distributed systems, and reliability practices
  • Strong understanding of observability, performance tuning, and scaling strategies
  • Deep familiarity with incident response, monitoring, and CI/CD systems

Responsibilities

  • Embed with product teams to enhance observability, reliability, and performance of their services
  • Own CI/CD pipelines, observability tooling, monitoring systems, and incident response processes
  • Build tools and automation to eliminate manual toil, improve engineering velocity and developer experience
  • Architect and scale infrastructure for best-in-class performance, availability, and operational excellence
  • Define and manage SLOs and error budgets in partnership with Engineering teams
0 views 0 saves 0 applications