1h ago

Staff Site Reliability Engineer - Observability

San Francisco, California

$194,000-$267,000 / year

full-timesenior Hybrididentity and security

Tech Stack

Description

You will own and expand the observability ecosystem into GCP, moving beyond monitoring to deliver a world-class, scalable platform that enables SRE teams and business partners. You will treat infrastructure as code, automating deployment of agents and collectors across complex distributed systems.

Requirements

  • 5+ years scaling and managing observability in Google Cloud Platform
  • Expertise in creating intuitive, actionable Splunk or Grafana dashboards
  • 3+ years SRE, DevOps, or Systems Engineering role focused on high-availability systems
  • Strong coding skills in Python and Go
  • Deep understanding of Linux internals, networking, and Kubernetes/GKE

Responsibilities

  • Design, build, and maintain scalable observability infrastructure using Terraform
  • Optimize collection, processing, and storage of observability data for high reliability and low latency of Splunk and Grafana services
  • Participate in on-call rotations and lead post-incident reviews
  • Automate deployment and scaling of observability agents and collectors
  • Eliminate toil through automation
0 views 0 saves 0 applications