19h ago
Distributed Systems Engineer
San Francisco
✨ $200k-$300k / yearest.
full-timesenior Hybridai-ml
🛠 Tech Stack
💼 About This Role
You'll build and operate the systems that turn raw compute into useful intelligence — the inference services that serve LLMs at scale and the data pipelines that feed them. One week you're hunting a tail-latency regression; the next you're redesigning a Ray pipeline for petabyte scale. You'll partner directly with researchers and ML engineers to make experimental workloads run reliably in production.
🎯 What You'll Do
- Design and operate distributed inference systems for LLMs.
- Build large-scale data pipelines with Ray Data or Spark.
- Debug production failure modes and improve observability.
- Partner with researchers to productionize experimental workloads.
📋 Requirements
- 5+ years building distributed systems in production.
- Experience with Ray, Spark, or similar large-scale framework.
- Strong fluency in Python and a systems language (Go, Rust, C++).
- Knowledge of GPU/accelerator stack (CUDA, NCCL, mixed precision).
- Experience with Kubernetes and operating production incidents.
✨ Nice to Have
- Hands-on with LLM inference engines (vLLM, SGLang, etc.).
- Experience with lakehouse formats (Iceberg, Delta, Hudi).
- Open-source contributions to relevant projects.
🎁 Benefits & Perks
- 🌍 Annual travel stipend to explore a new country.
- 🍽️ Weekly lunch stipend for take-out or groceries.
- 🏥 Comprehensive medical benefits.
- 🏖️ Generous paid time off.
- 🤝 Flexible hybrid work with in-person collaboration in Bay Area.
📨 Hiring Process
Estimated timeline: 2-4 weeks · AI estimate
- 1Recruiter Screen· 30 min
- 2Technical Interview· 60 min
- 3System Design Interview· 60 min
- 4Hiring Manager· 45 min
- 5Offer· 15 min
0 0 0