3h ago
Senior Site Reliability Engineer
Toronto, Ontario, Canada
full-timesenior Hybridincident management
Tech Stack
Description
As an early SRE leader at Rootly, you will own the technical foundation, embedding with product teams to enhance observability, reliability, and performance. You'll build automation, define SLOs, and drive scaling and capacity planning efforts for a high-growth incident management platform.
Requirements
- 5+ years experience in an SRE, Platform, or Infrastructure Engineering role
- 5+ years experience writing software in a production environment
- Strong technical knowledge of cloud infrastructure, distributed systems, and reliability practices
- Strong understanding of observability, performance tuning, and scaling strategies
- Deep familiarity with incident response, monitoring, and CI/CD systems
Responsibilities
- Embed with product teams to enhance observability, reliability, and performance of their services
- Own CI/CD pipelines, observability tooling, monitoring systems, and incident response processes
- Build tools and automation to eliminate manual toil, improve engineering velocity and developer experience
- Architect and scale infrastructure for best-in-class performance, availability, and operational excellence
- Define and manage SLOs and error budgets in partnership with Engineering teams
0 views 0 saves 0 applications