3h ago
Engineering Manager, Inference Routing and Performance
San Francisco, CA | New York City, NY
full-timemanagerArtificial Intelligence
Tech Stack
Description
You will lead a team building the cluster-level routing and coordination layer for Anthropic's inference fleet, making real-time decisions on request routing, cache placement, and cross-replica coordination to maximize throughput and efficiency. You'll drive system-level performance improvements, manage operational excellence, and grow a team of deep distributed-systems engineers.
Requirements
- Systems depth for architectural calls and evaluating technical candidates
- Experience with distributed systems and production reliability
- Ability to analyze performance at kernel, network, and framework levels
- Experience building and managing a team of engineers
- Strong communication and cross-team collaboration skills
Responsibilities
- Drive system-level performance for cluster-level inference efficiency
- Own technical roadmap for routing, cache placement, and cross-replica coordination
- Set technical strategy for routing across heterogeneous hardware and serving surfaces
- Run team's operational backbone: on-call, incident response, postmortems, deploy safety
- Develop and retain engineering team, hire for OS and framework depth
0 views 0 saves 0 applications