4h ago

Staff Software Engineer, AI Reliability Engineering

Dublin, IE

$235,000-$295,000 / year

full-timeseniorArtificial Intelligence Visa Sponsor

Tech Stack

Description

As a Staff Software Engineer on AI Reliability Engineering, you'll partner with teams across Anthropic to improve reliability across critical serving paths, from SDK through network, API layers, and accelerators. You'll design monitoring systems, set SLOs, lead incident response, and ensure the systems delivering Claude are robust and resilient.

Requirements

  • Strong distributed systems, infrastructure, or reliability backgrounds
  • Curious and comfortable jumping into unfamiliar systems during incidents
  • Think holistically about system composition and seams
  • Build lasting cross-team relationships
  • Excellent communication and collaboration skills

Responsibilities

  • Develop appropriate Service Level Objectives for large language model serving systems
  • Design and implement monitoring and observability systems across the token path
  • Assist in design and implementation of high-availability serving infrastructure across multiple regions and cloud providers
  • Lead incident response for critical AI services, ensuring rapid recovery and systematic improvements
  • Support reliability of safeguard model serving for site reliability and safety commitments
0 views 0 saves 0 applications