ML Model Serving Engineer at Sesame

2h ago

ML Model Serving Engineer

San Francisco

$175k-$280k / year

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll turbocharge our serving layer of LLM, speech, and vision models, partnering with ML infrastructure and training teams to build a fast, cost-effective serving system. You'll modify frameworks like VLLM and SGLang to implement cutting-edge optimization techniques.

🎯 What You'll Do

Optimize serving layer for LLM, speech, and vision models
Modify and extend LLM serving frameworks like VLLM and SGLang
Use in-flight batching, caching, and custom kernels to speed inference
Reduce model initialization times while maintaining quality

📋 Requirements

Expert in PyTorch or similar differentiable array computing framework
Expert in optimizing ML models for serving with high throughput and low latency
Significant systems programming experience with high-performance server systems
Significant performance engineering experience in bottleneck analysis

✨ Nice to Have

Familiarity with VLLM or SGLang internals and deployment
Experience with GCP, AWS, or Azure cloud platform
Experience deploying inference workloads with Kubernetes or Ray

🎁 Benefits & Perks

💰 401(k) match up to 3.5% of compensation
🏥 100% employer-paid health, vision, dental for you and dependents
🏖️ Unlimited PTO and sick time
💵 Flexible spending account with employer match up to $1,650/year
📈 Competitive stock options

📨 Hiring Process

[email protected]

Sesame

Sesame Jobs

Other jobs at Sesame

No other jobs found.

0 0 0