4h ago
Staff Software Engineer, AI Reliability Engineering
London, UK
$325,000-$390,000 / year
full-timeseniorArtificial Intelligence Visa Sponsor
Tech Stack
Description
You'll join the AIRE team to improve reliability across Anthropic's critical serving paths, from SDK through network, API layers, and accelerators. You'll develop SLOs, design observability systems, lead incident response, and collaborate cross-functionally to ensure Claude remains reliable for all users.
Requirements
- Strong distributed systems, infrastructure, or reliability background
- Comfortable jumping into unfamiliar systems during incidents
- Holistic systems thinking
- Excellent communication and cross-team collaboration
- Ownership over outcomes for systems you don't own
Responsibilities
- Develop Service Level Objectives for LLM serving systems
- Design and implement monitoring and observability across the token path
- Assist in designing high-availability serving infrastructure across regions and cloud providers
- Lead incident response for critical AI services
- Support reliability of safeguard model serving
0 views 0 saves 0 applications