3h ago

Staff Site Reliability Engineer

San Jose, California, United States

$133,400-$200,000 / year

full-timesenioraviation/aerospace

Tech Stack

Description

You will join Archer's SRE organization as a technical specialist, building custom internal tools and refining observability to ensure system resilience. You'll standardize incident response procedures, engineer SLO/SLI frameworks, and automate operational tasks to reduce toil.

Requirements

  • 8+ years in SRE, Production Engineering, or high-scale DevOps
  • Strong software engineering fundamentals and ability to write microservices and APIs
  • Deep understanding of deriving SLIs from distributed systems (latency percentiles, success rates)
  • Expert-level Kubernetes (EKS, GKE, or self-managed) and advanced Terraform or Pulumi
  • Mastery of observability pillars (Prometheus, Jaeger, or Datadog)

Responsibilities

  • Standardize incident response, error budget tracking, and production readiness procedures
  • Engineer SLOs and SLIs with backend logic for error budget calculations and automated alerts
  • Build special-purpose tooling (e.g., Kubernetes operators, automated remediation, deployment safety gates)
  • Create unified observability dashboards (Grafana/Datadog) for debugging and executive SLA monitoring
  • Automate repetitive operational tasks to reduce toil
0 views 0 saves 0 applications