2h ago
Staff SRE Engineer
Austin, Texas, United States
full-timeseniorreal estate
Tech Stack
+2
Description
You will shape the reliability, observability, and operational excellence of our platform infrastructure serving millions of users. As a technical leader and mentor, you'll establish best practices, drive architectural decisions, and enable 600+ engineers to deliver exceptional customer experiences using AWS, Kubernetes, and modern observability tools.
Requirements
- 8+ years in Site Reliability Engineering, DevOps, or Infrastructure Engineering
- 5+ years hands-on experience with AWS and Kubernetes including multi-cluster management
- Strong programming skills (Python, Go, or Java) with infrastructure automation and IaC (Terraform, CloudFormation)
- Production experience with observability tools (NewRelic, Datadog, Prometheus, Grafana, Splunk) and distributed systems
- Experience with CI/CD platforms and GitOps workflows (CircleCI, Argo CD, Jenkins); on-call rotation and incident response
Responsibilities
- Design and maintain highly available AWS infrastructure including EKS clusters, Fargate, and multi-region architectures
- Own reliability of critical services: Skyway (CI/CD), Frontdoor (Tyk), Pantheon (Apollo GraphQL), and supporting infrastructure
- Establish SLIs, SLOs, and error budgets for Tier 1/2/3 systems; lead architectural reviews for reliability and cost-efficiency
- Build comprehensive observability using NewRelic for APM, distributed tracing, metrics, and logging
- Design chaos engineering experiments and lead game day exercises to identify system weaknesses
0 views 0 saves 0 applications