8h ago

Member of Technical Staff - ML Systems & Inference

San Francisco

$180k-$250k / yearest.

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll design and build the inference systems that execute full models end-to-end under real production constraints. You'll work at the intersection of model architecture, runtime behavior, and system performance to ensure inference is fast, predictable, and scalable. This role is ideal for engineers who deeply understand how modern models execute in practice.

🎯 What You'll Do

  • Design and optimize end-to-end inference pipelines from request ingestion through execution and response
  • Build and evolve inference runtimes balancing latency, throughput, and concurrency
  • Reason about batching, queuing, and scheduling tradeoffs including tail latency and fairness
  • Manage KV cache allocation, placement, reuse, and eviction across models and requests

📋 Requirements

  • Strong software engineering fundamentals
  • Experience building or operating ML inference or model serving systems
  • Comfort reasoning about performance, memory usage, and system behavior under load

✨ Nice to Have

  • Experience with inference runtimes such as TensorRT-LLM, vLLM, or custom serving systems
  • Deep understanding of modern model architectures and attention mechanisms
  • Experience with batching, scheduling, and concurrency control in inference systems
0 0 0