1h ago
Staff Site Reliability Engineer - Observability
San Francisco, California
$194,000-$267,000 / year
full-timesenior Hybrididentity and security
Tech Stack
Description
You will own and expand the observability ecosystem into GCP, moving beyond monitoring to deliver a world-class, scalable platform that enables SRE teams and business partners. You will treat infrastructure as code, automating deployment of agents and collectors across complex distributed systems.
Requirements
- 5+ years scaling and managing observability in Google Cloud Platform
- Expertise in creating intuitive, actionable Splunk or Grafana dashboards
- 3+ years SRE, DevOps, or Systems Engineering role focused on high-availability systems
- Strong coding skills in Python and Go
- Deep understanding of Linux internals, networking, and Kubernetes/GKE
Responsibilities
- Design, build, and maintain scalable observability infrastructure using Terraform
- Optimize collection, processing, and storage of observability data for high reliability and low latency of Splunk and Grafana services
- Participate in on-call rotations and lead post-incident reviews
- Automate deployment and scaling of observability agents and collectors
- Eliminate toil through automation
0 views 0 saves 0 applications