3h ago
Member of Technical Staff - Inference
San Francisco, CA; Remote
$150k-$300k / year
full-timesenior Hybridai-ml Visa Sponsor
🛠 Tech Stack
+2
💼 About This Role
You'll build the infrastructure to serve LLMs efficiently at scale and optimize inference systems integrated with our RL training stack. Your work will directly enable researchers and enterprises to run reinforcement learning at frontier scale across global compute fleets.
🎯 What You'll Do
- Build a multi-tenant LLM serving platform across cloud GPU fleets.
- Design GPU-aware scheduling algorithms for heterogeneous accelerators.
- Integrate and optimize LLM inference frameworks like vLLM, SGLang.
- Develop performance suites for latency, throughput, and scalability.
📋 Requirements
- 3+ years building large-scale ML/LLM services with latency/availability SLOs.
- Hands-on with at least one of vLLM, SGLang, or TensorRT-LLM.
- Deep understanding of prefill vs. decode, KV-cache, batching, speculative decoding.
- Proficiency in Python and PyTorch for systems tooling and inference integration.
✨ Nice to Have
- Familiarity with CUDA/Triton kernel development and profiling.
- Experience with Rust, C++, or systems performance languages.
- Contributions to open-source serving or inference projects.
🎁 Benefits & Perks
- 💰 Cash compensation $150-300k with significant equity.
- 🏖️ Flexible work (remote or San Francisco office).
- ✈️ Full visa sponsorship and relocation support.
- 📚 Professional development budget.
- 🎉 Regular team off-sites and conference attendance.
0 0 0