Staff Cloud Site Reliability Engineer at Jobs at Wayve | First — CareerPair

8h ago

Staff Cloud Site Reliability Engineer

London

✨ $180k-$250k / yearest.

full-timelead HybridAutonomous Vehicles / AI

🛠 Tech Stack

💼 About This Role

You'll build and scale the reliability foundations of our AI cloud platform, including model development and GPU compute. You'll define SLOs, automation, and operational standards for large-scale distributed systems. This is a founding SRE role where you'll shape the function from scratch.

🎯 What You'll Do

Own reliability, availability, and performance of cloud platforms.
Define and operationalize SLOs, SLIs, and error budgets.
Participate in 24/7 on-call rotation for incident response.
Build automation for cluster operations and scaling tasks.

📋 Requirements

Proven experience in an SRE role supporting large-scale cloud systems.
Strong Kubernetes experience including production clusters.
Hands-on experience with AWS, GCP, or Azure.
Experience operating complex distributed systems with compute-heavy workloads.

✨ Nice to Have

Experience operating GPU-backed environments or ML infrastructure.
Familiarity with infrastructure-as-code (e.g., Terraform).
Experience defining SLOs/SLIs and building reliability programs.

🎁 Benefits & Perks

🚀 Founding role with high impact on AI infrastructure.
💻 Hybrid work - 2 days per week in London office.
🌍 Work with cutting-edge Embodied AI technology.

Jobs at Wayve | First

Find your next job at Wayve

Other jobs at Jobs at Wayve | First

No other jobs found.

0 0 0