8h ago

Staff / Principal Machine Learning Engineer, Serving

Mountain View, California, USA

$270k-$500k / year

full-timelead Hybridai-ml

🛠 Tech Stack

💼 About This Role

You'll lead inference optimization and model serving for a top AI research lab, building realtime multimodal systems. Your work directly powers products used by leading companies. You'll own full-cycle delivery from research to production.

🎯 What You'll Do

  • Optimize inference serving frameworks like vLLM or TRT-LLM
  • Profile and accelerate model performance on NVIDIA GPUs
  • Design distributed systems for multi-GPU/multi-node inference
  • Containerize and deploy models to production reliably

📋 Requirements

  • Deep understanding of serving frameworks (vLLM, TRT-LLM)
  • Hands-on experience with quantization, distillation, caching
  • Proficiency in C++, CUDA, Rust, or optimized Python
  • Experience with Kubernetes, Ray, and distributed scaling

✨ Nice to Have

  • Non-trivial open-source contributions to inference engines
  • PhD in CS, Physics, Math or equivalent practical experience
  • Public technical write-ups or deep-dive systems projects

🎁 Benefits & Perks

  • 💰 Competitive base salary $270k-$500k+ bonus+equity
  • 🏢 Relocation assistance to Mountain View office
  • 📈 Equity in a top AI startup backed by major VCs
  • 🏥 Benefits package (not detailed but implied)
0 0 0