2h ago

Senior Site Reliability Engineer, Production Engineering

Seattle, Washington, United States
full-timeseniordefense technology

Tech Stack

+2

Description

You will build and maintain the infrastructure, tooling, and processes that ensure the Lattice platform operates reliably 24/7/365. Working with platform and product teams, you'll proactively identify reliability risks, implement defensive strategies, and drive incident response to directly support national security missions.

Requirements

  • 7+ years engineering experience with 3+ focused on SRE or production operations
  • Bachelor's in CS or equivalent practical experience
  • Deep Kubernetes expertise in production environments (100+ nodes)
  • Strong programming in Go, Python, Rust, or Java
  • Proven experience with observability stacks (Prometheus, Grafana, ELK/EFK)

Responsibilities

  • Design monitoring, observability, and alerting systems for early detection of reliability issues
  • Drive incident response and conduct blameless postmortems
  • Build infrastructure automation using Terraform, Kubernetes operators, and custom tooling
  • Establish SLOs and Error Budgets to balance velocity with reliability
  • Partner with software engineering teams to improve system architecture for reliability
0 views 0 saves 0 applications