1h ago
Senior Site Reliability Engineer - Observability
Bellevue, Washington
$147,000-$202,000 / year
full-timesenior HybridIdentity and Access Management / Security
Tech Stack
Description
You will own and evolve our Splunk ecosystem, moving beyond simple monitoring to deliver a world-class Observability Platform. You'll treat infrastructure as code using Terraform and automate agent deployments across complex distributed systems.
Requirements
- 5+ years scaling and managing Splunk Cloud (1000+ SVCs) including WLM and HEC optimization
- Expertise in creating actionable Splunk dashboards correlating data from multiple sources
- 3+ years SRE, DevOps, or Systems Engineering experience focusing on high-availability systems
- Strong coding skills in SPL and Go for building internal tools and automation
- Deep understanding of Linux internals, networking, and Kubernetes/EKS
Responsibilities
- Design, build, and maintain scalable observability infrastructure using Terraform
- Optimize collection, processing, and storage of log data in Splunk for high reliability and low latency
- Participate in on-call rotations and lead post-incident reviews
- Eliminate toil by automating deployment and scaling of observability agents and collectors
0 views 0 saves 0 applications