Senior Site Reliability Engineer at Rootly

3h ago

Senior Site Reliability Engineer

Toronto, Ontario, Canada

full-timesenior Hybridincident management

Tech Stack

Description

As an early SRE leader at Rootly, you will own the technical foundation, embedding with product teams to enhance observability, reliability, and performance. You'll build automation, define SLOs, and drive scaling and capacity planning efforts for a high-growth incident management platform.

Requirements

5+ years experience in an SRE, Platform, or Infrastructure Engineering role
5+ years experience writing software in a production environment
Strong technical knowledge of cloud infrastructure, distributed systems, and reliability practices
Strong understanding of observability, performance tuning, and scaling strategies
Deep familiarity with incident response, monitoring, and CI/CD systems

Responsibilities

Embed with product teams to enhance observability, reliability, and performance of their services
Own CI/CD pipelines, observability tooling, monitoring systems, and incident response processes
Build tools and automation to eliminate manual toil, improve engineering velocity and developer experience
Architect and scale infrastructure for best-in-class performance, availability, and operational excellence
Define and manage SLOs and error budgets in partnership with Engineering teams

Rootly

We're hiring. View open roles at Rootly.

Other jobs at Rootly

No other jobs found.

0 views 0 saves 0 applications