3h ago

Engineering Manager, Inference Routing and Performance

San Francisco, CA | New York City, NY
full-timemanagerArtificial Intelligence

Tech Stack

Description

You will lead a team building the cluster-level routing and coordination layer for Anthropic's inference fleet, making real-time decisions on request routing, cache placement, and cross-replica coordination to maximize throughput and efficiency. You'll drive system-level performance improvements, manage operational excellence, and grow a team of deep distributed-systems engineers.

Requirements

  • Systems depth for architectural calls and evaluating technical candidates
  • Experience with distributed systems and production reliability
  • Ability to analyze performance at kernel, network, and framework levels
  • Experience building and managing a team of engineers
  • Strong communication and cross-team collaboration skills

Responsibilities

  • Drive system-level performance for cluster-level inference efficiency
  • Own technical roadmap for routing, cache placement, and cross-replica coordination
  • Set technical strategy for routing across heterogeneous hardware and serving surfaces
  • Run team's operational backbone: on-call, incident response, postmortems, deploy safety
  • Develop and retain engineering team, hire for OS and framework depth
0 views 0 saves 0 applications