9h ago

Member of Technical Staff, Inference

San Francisco

$200k-$400k / year

full-time Remoteai-ml Visa Sponsor

🛠 Tech Stack

💼 About This Role

You'll work at the core of vLLM, optimizing how models execute across diverse hardware and architectures. Your work will directly impact how the world runs AI inference. You'll sit at the intersection of models and hardware, pushing the boundaries of LLM and diffusion model serving.

🎯 What You'll Do

  • Optimize inference runtime for LLM and diffusion models
  • Implement and refine model architectures from research papers
  • Contribute performant and maintainable code to vLLM codebase
  • Debug and profile complex ML codebases

📋 Requirements

  • Bachelor's in computer science or equivalent
  • Deep understanding of transformer architectures
  • Strong programming in Python with PyTorch internals
  • Experience with LLM inference systems (vLLM, TensorRT-LLM, etc.)

✨ Nice to Have

  • Deep understanding of KV-cache and prefix caching
  • Familiarity with RL frameworks for LLMs
  • Experience with multimodal inference

🎁 Benefits & Perks

  • 🏖️ Equity included
  • 🏥 Health, dental, and vision benefits
  • 💰 401(k) company match
0 0 0