3h ago

Staff Platform Site Reliability Specialist (Observability & Kubernetes)

Ontario

~$150,000-$210,000 / yearest.

E
full-timelead RemoteEnterprise Software / Critical Event Management

🛠 Tech Stack

💼 About This Role

You'll own and evolve the enterprise observability platform for Everbridge, ensuring deep visibility into system health across a large-scale cloud-native environment. Your core impact includes standardizing instrumentation and driving reliability improvements.

🎯 What You'll Do

  • Design, operate, and evolve the Grafana observability stack
  • Build and maintain highly available platforms for metrics, logs, and traces
  • Manage EKS cluster lifecycle, security, and upgrades
  • Standardize instrumentation, dashboards, alerts, and SLOs

📋 Requirements

  • 6+ years in SRE or Platform Engineering
  • Expertise in Grafana ecosystem (Loki, Mimir, Tempo)
  • Proficiency in Kubernetes and Amazon EKS
  • Experience with Terraform for infrastructure as code

✨ Nice to Have

  • Experience with Gitlab CI/CD
  • Familiarity with HashiCorp Packer
  • Knowledge of AWS/GCP cloud environments
0 0 0