7h ago

Software Engineer, Infrastructure Reliability

San Francisco

$255k-$385k / year

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll be at the heart of scaling and hardening the infrastructure that powers AI systems like ChatGPT, ensuring high reliability and performance for millions of users. You'll drive automation and system resilience while collaborating with infra, product, and research teams.

🎯 What You'll Do

  • Design, build, and operate reliable systems across engineering.
  • Identify and fix performance bottlenecks and inefficiencies.
  • Contribute to incident response and postmortems.
  • Improve automation and internal tooling.

📋 Requirements

  • 4+ years of relevant industry experience.
  • Strong proficiency in cloud infrastructure (AWS, GCP, Azure) and Terraform.
  • Experience with Kubernetes at scale.
  • Experience with observability tools like Datadog, Prometheus, Grafana.

✨ Nice to Have

  • Experience with service mesh technologies.
  • Knowledge of microservices architecture.
  • Experience as a tech lead on large-scale projects.
0 0 0