Member of Technical Staff, Inference at Inferact

9h ago

Member of Technical Staff, Inference

San Francisco

$200k-$400k / year

full-time Remoteai-ml Visa Sponsor

🛠 Tech Stack

💼 About This Role

You'll work at the core of vLLM, optimizing how models execute across diverse hardware and architectures. Your work will directly impact how the world runs AI inference. You'll sit at the intersection of models and hardware, pushing the boundaries of LLM and diffusion model serving.

🎯 What You'll Do

Optimize inference runtime for LLM and diffusion models
Implement and refine model architectures from research papers
Contribute performant and maintainable code to vLLM codebase
Debug and profile complex ML codebases

📋 Requirements

Bachelor's in computer science or equivalent
Deep understanding of transformer architectures
Strong programming in Python with PyTorch internals
Experience with LLM inference systems (vLLM, TensorRT-LLM, etc.)

✨ Nice to Have

Deep understanding of KV-cache and prefix caching
Familiarity with RL frameworks for LLMs
Experience with multimodal inference

🎁 Benefits & Perks

🏖️ Equity included
🏥 Health, dental, and vision benefits
💰 401(k) company match

Inferact

Inferact Jobs

Other jobs at Inferact

No other jobs found.

0 0 0