8h ago
Staff / Principal Machine Learning Engineer, Serving
Mountain View, California, USA
$270k-$500k / year
full-timelead Hybridai-ml
🛠 Tech Stack
💼 About This Role
You'll lead inference optimization and model serving for a top AI research lab, building realtime multimodal systems. Your work directly powers products used by leading companies. You'll own full-cycle delivery from research to production.
🎯 What You'll Do
- Optimize inference serving frameworks like vLLM or TRT-LLM
- Profile and accelerate model performance on NVIDIA GPUs
- Design distributed systems for multi-GPU/multi-node inference
- Containerize and deploy models to production reliably
📋 Requirements
- Deep understanding of serving frameworks (vLLM, TRT-LLM)
- Hands-on experience with quantization, distillation, caching
- Proficiency in C++, CUDA, Rust, or optimized Python
- Experience with Kubernetes, Ray, and distributed scaling
✨ Nice to Have
- Non-trivial open-source contributions to inference engines
- PhD in CS, Physics, Math or equivalent practical experience
- Public technical write-ups or deep-dive systems projects
🎁 Benefits & Perks
- 💰 Competitive base salary $270k-$500k+ bonus+equity
- 🏢 Relocation assistance to Mountain View office
- 📈 Equity in a top AI startup backed by major VCs
- 🏥 Benefits package (not detailed but implied)
0 0 0