Staff Site Reliability Engineer - Observability at Careers — CareerPair

1h ago

Staff Site Reliability Engineer - Observability

San Francisco, California

$194,000-$267,000 / year

full-timesenior Hybrididentity and security

Tech Stack

Description

You will own and expand the observability ecosystem into GCP, moving beyond monitoring to deliver a world-class, scalable platform that enables SRE teams and business partners. You will treat infrastructure as code, automating deployment of agents and collectors across complex distributed systems.

Requirements

5+ years scaling and managing observability in Google Cloud Platform
Expertise in creating intuitive, actionable Splunk or Grafana dashboards
3+ years SRE, DevOps, or Systems Engineering role focused on high-availability systems
Strong coding skills in Python and Go
Deep understanding of Linux internals, networking, and Kubernetes/GKE

Responsibilities

Design, build, and maintain scalable observability infrastructure using Terraform
Optimize collection, processing, and storage of observability data for high reliability and low latency of Splunk and Grafana services
Participate in on-call rotations and lead post-incident reviews
Automate deployment and scaling of observability agents and collectors
Eliminate toil through automation

Careers

Help us build the next generation of corporate IT by bringing your talent and motivation to Okta, the leader in identity and access management.

Other jobs at Careers

No other jobs found.

0 views 0 saves 0 applications