Distributed Systems Engineer at adaption

19h ago

Distributed Systems Engineer

San Francisco

✨ $200k-$300k / yearest.

full-timesenior Hybridai-ml

🛠 Tech Stack

💼 About This Role

You'll build and operate the systems that turn raw compute into useful intelligence — the inference services that serve LLMs at scale and the data pipelines that feed them. One week you're hunting a tail-latency regression; the next you're redesigning a Ray pipeline for petabyte scale. You'll partner directly with researchers and ML engineers to make experimental workloads run reliably in production.

🎯 What You'll Do

Design and operate distributed inference systems for LLMs.
Build large-scale data pipelines with Ray Data or Spark.
Debug production failure modes and improve observability.
Partner with researchers to productionize experimental workloads.

📋 Requirements

5+ years building distributed systems in production.
Experience with Ray, Spark, or similar large-scale framework.
Strong fluency in Python and a systems language (Go, Rust, C++).
Knowledge of GPU/accelerator stack (CUDA, NCCL, mixed precision).
Experience with Kubernetes and operating production incidents.

✨ Nice to Have

Hands-on with LLM inference engines (vLLM, SGLang, etc.).
Experience with lakehouse formats (Iceberg, Delta, Hudi).
Open-source contributions to relevant projects.

🎁 Benefits & Perks

🌍 Annual travel stipend to explore a new country.
🍽️ Weekly lunch stipend for take-out or groceries.
🏥 Comprehensive medical benefits.
🏖️ Generous paid time off.
🤝 Flexible hybrid work with in-person collaboration in Bay Area.

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

1Recruiter Screen· 30 min
2Technical Interview· 60 min
3System Design Interview· 60 min
4Hiring Manager· 45 min
5Offer· 15 min

adaption

adaption Jobs

Other jobs at adaption

No other jobs found.

0 0 0