Member of Technical Staff - Inference at Prime Intellect — CareerPair

3h ago

Member of Technical Staff - Inference

San Francisco, CA; Remote

$150k-$300k / year

full-timesenior Hybridai-ml Visa Sponsor

🛠 Tech Stack

💼 About This Role

You'll build the infrastructure to serve LLMs efficiently at scale and optimize inference systems integrated with our RL training stack. Your work will directly enable researchers and enterprises to run reinforcement learning at frontier scale across global compute fleets.

🎯 What You'll Do

Build a multi-tenant LLM serving platform across cloud GPU fleets.
Design GPU-aware scheduling algorithms for heterogeneous accelerators.
Integrate and optimize LLM inference frameworks like vLLM, SGLang.
Develop performance suites for latency, throughput, and scalability.

📋 Requirements

3+ years building large-scale ML/LLM services with latency/availability SLOs.
Hands-on with at least one of vLLM, SGLang, or TensorRT-LLM.
Deep understanding of prefill vs. decode, KV-cache, batching, speculative decoding.
Proficiency in Python and PyTorch for systems tooling and inference integration.

✨ Nice to Have

Familiarity with CUDA/Triton kernel development and profiling.
Experience with Rust, C++, or systems performance languages.
Contributions to open-source serving or inference projects.

🎁 Benefits & Perks

💰 Cash compensation $150-300k with significant equity.
🏖️ Flexible work (remote or San Francisco office).
✈️ Full visa sponsorship and relocation support.
📚 Professional development budget.
🎉 Regular team off-sites and conference attendance.

Prime Intellect

Prime Intellect Jobs

Other jobs at Prime Intellect

No other jobs found.

0 0 0