1h ago

Senior Site Reliability Engineer - Observability

Bellevue, Washington

$147,000-$202,000 / year

full-timesenior HybridIdentity and Access Management / Security

Tech Stack

Description

You will own and evolve our Splunk ecosystem, moving beyond simple monitoring to deliver a world-class Observability Platform. You'll treat infrastructure as code using Terraform and automate agent deployments across complex distributed systems.

Requirements

  • 5+ years scaling and managing Splunk Cloud (1000+ SVCs) including WLM and HEC optimization
  • Expertise in creating actionable Splunk dashboards correlating data from multiple sources
  • 3+ years SRE, DevOps, or Systems Engineering experience focusing on high-availability systems
  • Strong coding skills in SPL and Go for building internal tools and automation
  • Deep understanding of Linux internals, networking, and Kubernetes/EKS

Responsibilities

  • Design, build, and maintain scalable observability infrastructure using Terraform
  • Optimize collection, processing, and storage of log data in Splunk for high reliability and low latency
  • Participate in on-call rotations and lead post-incident reviews
  • Eliminate toil by automating deployment and scaling of observability agents and collectors
0 views 0 saves 0 applications