4h ago

Staff Software Engineer, AI Reliability Engineering

London, UK

$325,000-$390,000 / year

full-timeseniorArtificial Intelligence Visa Sponsor

Tech Stack

Description

You'll join the AIRE team to improve reliability across Anthropic's critical serving paths, from SDK through network, API layers, and accelerators. You'll develop SLOs, design observability systems, lead incident response, and collaborate cross-functionally to ensure Claude remains reliable for all users.

Requirements

  • Strong distributed systems, infrastructure, or reliability background
  • Comfortable jumping into unfamiliar systems during incidents
  • Holistic systems thinking
  • Excellent communication and cross-team collaboration
  • Ownership over outcomes for systems you don't own

Responsibilities

  • Develop Service Level Objectives for LLM serving systems
  • Design and implement monitoring and observability across the token path
  • Assist in designing high-availability serving infrastructure across regions and cloud providers
  • Lead incident response for critical AI services
  • Support reliability of safeguard model serving
0 views 0 saves 0 applications