2h ago

Staff SRE Engineer

Austin, Texas, United States
full-timeseniorreal estate

Tech Stack

+2

Description

You will shape the reliability, observability, and operational excellence of our platform infrastructure serving millions of users. As a technical leader and mentor, you'll establish best practices, drive architectural decisions, and enable 600+ engineers to deliver exceptional customer experiences using AWS, Kubernetes, and modern observability tools.

Requirements

  • 8+ years in Site Reliability Engineering, DevOps, or Infrastructure Engineering
  • 5+ years hands-on experience with AWS and Kubernetes including multi-cluster management
  • Strong programming skills (Python, Go, or Java) with infrastructure automation and IaC (Terraform, CloudFormation)
  • Production experience with observability tools (NewRelic, Datadog, Prometheus, Grafana, Splunk) and distributed systems
  • Experience with CI/CD platforms and GitOps workflows (CircleCI, Argo CD, Jenkins); on-call rotation and incident response

Responsibilities

  • Design and maintain highly available AWS infrastructure including EKS clusters, Fargate, and multi-region architectures
  • Own reliability of critical services: Skyway (CI/CD), Frontdoor (Tyk), Pantheon (Apollo GraphQL), and supporting infrastructure
  • Establish SLIs, SLOs, and error budgets for Tier 1/2/3 systems; lead architectural reviews for reliability and cost-efficiency
  • Build comprehensive observability using NewRelic for APM, distributed tracing, metrics, and logging
  • Design chaos engineering experiments and lead game day exercises to identify system weaknesses
0 views 0 saves 0 applications