10h ago
Senior Software Engineer - Model Performance
San Francisco
$220k-$320k / year
full-timemidai-ml
🛠 Tech Stack
💼 About This Role
You'll make our inference stack as fast and efficient as possible, working from CUDA kernels to serving frameworks to eliminate bottlenecks. Your north star is inference performance: latency, throughput, and cost efficiency. You'll have autonomy, a large compute budget, and technical support to push limits.
🎯 What You'll Do
- Implement optimization techniques like quantization and speculative decoding
- Deep dive into inference frameworks to debug and improve performance
- Profile and optimize CUDA kernels and GPU utilization
- Build tooling and benchmarks to track inference performance
📋 Requirements
- 2+ years of experience in ML systems or inference optimization
- Strong proficiency in Python and familiarity with C++
- Hands-on experience with LLM inference frameworks (vLLM, SGLang, TensorRT-LLM)
- Deep understanding of GPU architecture and profiling GPU workloads
✨ Nice to Have
- Experience with CUDA programming
- Familiarity with serving non-LLM models (TTS, vision, embeddings)
- Contributions to open-source inference frameworks
🎁 Benefits & Perks
- 💰 Competitive base salary $220,000 - $320,000
- 📈 Equity in high-growth startup
- 🏥 Comprehensive benefits
- 🏢 In-person collaboration in downtown SF
📨 Hiring Process
0 0 0