4h ago
Staff Software Engineer, AI Reliability Engineering
Dublin, IE
$235,000-$295,000 / year
full-timeseniorArtificial Intelligence Visa Sponsor
Tech Stack
Description
As a Staff Software Engineer on AI Reliability Engineering, you'll partner with teams across Anthropic to improve reliability across critical serving paths, from SDK through network, API layers, and accelerators. You'll design monitoring systems, set SLOs, lead incident response, and ensure the systems delivering Claude are robust and resilient.
Requirements
- Strong distributed systems, infrastructure, or reliability backgrounds
- Curious and comfortable jumping into unfamiliar systems during incidents
- Think holistically about system composition and seams
- Build lasting cross-team relationships
- Excellent communication and collaboration skills
Responsibilities
- Develop appropriate Service Level Objectives for large language model serving systems
- Design and implement monitoring and observability systems across the token path
- Assist in design and implementation of high-availability serving infrastructure across multiple regions and cloud providers
- Lead incident response for critical AI services, ensuring rapid recovery and systematic improvements
- Support reliability of safeguard model serving for site reliability and safety commitments
0 views 0 saves 0 applications