3h ago
Staff Site Reliability Engineer
San Jose, California, United States
$133,400-$200,000 / year
full-timesenioraviation/aerospace
Tech Stack
Description
You will join Archer's SRE organization as a technical specialist, building custom internal tools and refining observability to ensure system resilience. You'll standardize incident response procedures, engineer SLO/SLI frameworks, and automate operational tasks to reduce toil.
Requirements
- 8+ years in SRE, Production Engineering, or high-scale DevOps
- Strong software engineering fundamentals and ability to write microservices and APIs
- Deep understanding of deriving SLIs from distributed systems (latency percentiles, success rates)
- Expert-level Kubernetes (EKS, GKE, or self-managed) and advanced Terraform or Pulumi
- Mastery of observability pillars (Prometheus, Jaeger, or Datadog)
Responsibilities
- Standardize incident response, error budget tracking, and production readiness procedures
- Engineer SLOs and SLIs with backend logic for error budget calculations and automated alerts
- Build special-purpose tooling (e.g., Kubernetes operators, automated remediation, deployment safety gates)
- Create unified observability dashboards (Grafana/Datadog) for debugging and executive SLA monitoring
- Automate repetitive operational tasks to reduce toil
0 views 0 saves 0 applications