3h ago
Staff Software Engineer, Inference
London, UK
$325,000-$390,000 / year
full-timeseniorArtificial Intelligence
Tech Stack
Description
You will build and maintain the critical systems that serve Claude to millions of users worldwide, working across the entire inference stack from intelligent request routing to fleet-wide orchestration across diverse AI accelerators.
Requirements
- Significant software engineering experience with distributed systems
- Experience with performance optimization, large-scale service orchestration
- Familiarity with LLM inference optimization, batching strategies, multi-accelerator deployments
- Experience with Kubernetes and cloud infrastructure (AWS, GCP)
- Proficiency in Python or Rust
Responsibilities
- Designing intelligent routing algorithms for request distribution across thousands of accelerators
- Autoscaling compute fleet to match supply with demand across production and research workloads
- Building production-grade deployment pipelines for releasing new models
- Integrating new AI accelerator platforms
- Contributing to inference features like structured sampling and prompt caching
0 views 0 saves 0 applications