7h ago
Software Engineer, Infrastructure Reliability
San Francisco
$255k-$385k / year
full-timeseniorai-ml
🛠 Tech Stack
💼 About This Role
You'll be at the heart of scaling and hardening the infrastructure that powers AI systems like ChatGPT, ensuring high reliability and performance for millions of users. You'll drive automation and system resilience while collaborating with infra, product, and research teams.
🎯 What You'll Do
- Design, build, and operate reliable systems across engineering.
- Identify and fix performance bottlenecks and inefficiencies.
- Contribute to incident response and postmortems.
- Improve automation and internal tooling.
📋 Requirements
- 4+ years of relevant industry experience.
- Strong proficiency in cloud infrastructure (AWS, GCP, Azure) and Terraform.
- Experience with Kubernetes at scale.
- Experience with observability tools like Datadog, Prometheus, Grafana.
✨ Nice to Have
- Experience with service mesh technologies.
- Knowledge of microservices architecture.
- Experience as a tech lead on large-scale projects.
0 0 0