2h ago
Senior Site Reliability Engineer, Production Engineering
Seattle, Washington, United States
full-timeseniordefense technology
Tech Stack
+2
Description
You will build and maintain the infrastructure, tooling, and processes that ensure the Lattice platform operates reliably 24/7/365. Working with platform and product teams, you'll proactively identify reliability risks, implement defensive strategies, and drive incident response to directly support national security missions.
Requirements
- 7+ years engineering experience with 3+ focused on SRE or production operations
- Bachelor's in CS or equivalent practical experience
- Deep Kubernetes expertise in production environments (100+ nodes)
- Strong programming in Go, Python, Rust, or Java
- Proven experience with observability stacks (Prometheus, Grafana, ELK/EFK)
Responsibilities
- Design monitoring, observability, and alerting systems for early detection of reliability issues
- Drive incident response and conduct blameless postmortems
- Build infrastructure automation using Terraform, Kubernetes operators, and custom tooling
- Establish SLOs and Error Budgets to balance velocity with reliability
- Partner with software engineering teams to improve system architecture for reliability
0 views 0 saves 0 applications