Member of Technical Staff - ML Systems & Inference at Gimlet Labs — CareerPair

8h ago

Member of Technical Staff - ML Systems & Inference

San Francisco

✨ $180k-$250k / yearest.

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll design and build the inference systems that execute full models end-to-end under real production constraints. You'll work at the intersection of model architecture, runtime behavior, and system performance to ensure inference is fast, predictable, and scalable. This role is ideal for engineers who deeply understand how modern models execute in practice.

🎯 What You'll Do

Design and optimize end-to-end inference pipelines from request ingestion through execution and response
Build and evolve inference runtimes balancing latency, throughput, and concurrency
Reason about batching, queuing, and scheduling tradeoffs including tail latency and fairness
Manage KV cache allocation, placement, reuse, and eviction across models and requests

📋 Requirements

Strong software engineering fundamentals
Experience building or operating ML inference or model serving systems
Comfort reasoning about performance, memory usage, and system behavior under load

✨ Nice to Have

Experience with inference runtimes such as TensorRT-LLM, vLLM, or custom serving systems
Deep understanding of modern model architectures and attention mechanisms
Experience with batching, scheduling, and concurrency control in inference systems

Gimlet Labs

Gimlet Labs Jobs

Other jobs at Gimlet Labs

No other jobs found.

0 0 0