19h ago

Distributed Systems Engineer

San Francisco

$200k-$300k / yearest.

full-timesenior Hybridai-ml

🛠 Tech Stack

💼 About This Role

You'll build and operate the systems that turn raw compute into useful intelligence — the inference services that serve LLMs at scale and the data pipelines that feed them. One week you're hunting a tail-latency regression; the next you're redesigning a Ray pipeline for petabyte scale. You'll partner directly with researchers and ML engineers to make experimental workloads run reliably in production.

🎯 What You'll Do

  • Design and operate distributed inference systems for LLMs.
  • Build large-scale data pipelines with Ray Data or Spark.
  • Debug production failure modes and improve observability.
  • Partner with researchers to productionize experimental workloads.

📋 Requirements

  • 5+ years building distributed systems in production.
  • Experience with Ray, Spark, or similar large-scale framework.
  • Strong fluency in Python and a systems language (Go, Rust, C++).
  • Knowledge of GPU/accelerator stack (CUDA, NCCL, mixed precision).
  • Experience with Kubernetes and operating production incidents.

✨ Nice to Have

  • Hands-on with LLM inference engines (vLLM, SGLang, etc.).
  • Experience with lakehouse formats (Iceberg, Delta, Hudi).
  • Open-source contributions to relevant projects.

🎁 Benefits & Perks

  • 🌍 Annual travel stipend to explore a new country.
  • 🍽️ Weekly lunch stipend for take-out or groceries.
  • 🏥 Comprehensive medical benefits.
  • 🏖️ Generous paid time off.
  • 🤝 Flexible hybrid work with in-person collaboration in Bay Area.

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

  1. 1Recruiter Screen· 30 min
  2. 2Technical Interview· 60 min
  3. 3System Design Interview· 60 min
  4. 4Hiring Manager· 45 min
  5. 5Offer· 15 min
0 0 0