3h ago
Staff Platform Site Reliability Specialist (Observability & Kubernetes)
Ontario
~$150,000-$210,000 / yearest.
E
full-timelead RemoteEnterprise Software / Critical Event Management
🛠 Tech Stack
💼 About This Role
You'll own and evolve the enterprise observability platform for Everbridge, ensuring deep visibility into system health across a large-scale cloud-native environment. Your core impact includes standardizing instrumentation and driving reliability improvements.
🎯 What You'll Do
- Design, operate, and evolve the Grafana observability stack
- Build and maintain highly available platforms for metrics, logs, and traces
- Manage EKS cluster lifecycle, security, and upgrades
- Standardize instrumentation, dashboards, alerts, and SLOs
📋 Requirements
- 6+ years in SRE or Platform Engineering
- Expertise in Grafana ecosystem (Loki, Mimir, Tempo)
- Proficiency in Kubernetes and Amazon EKS
- Experience with Terraform for infrastructure as code
✨ Nice to Have
- Experience with Gitlab CI/CD
- Familiarity with HashiCorp Packer
- Knowledge of AWS/GCP cloud environments
0 0 0