19h ago
Software Engineer - Model Performance
San Francisco
$180k-$360k / year
full-time Hybridai-ml
🛠 Tech Stack
💼 About This Role
You'll join the Model Performance team to implement cutting-edge techniques for ML model inference. You will optimize large language models and deep dive into codebases like TensorRT and PyTorch to debug performance issues.
🎯 What You'll Do
- Implement quantization, speculative decoding, and other inference techniques.
- Debug ML performance issues in TensorRT, PyTorch, and CUDA codebases.
- Apply optimization techniques across a range of ML models.
- Own projects from idea to production.
📋 Requirements
- Bachelor's or higher in Computer Science, Engineering, or related field.
- Experience with Python or C++.
- Familiarity with LLM optimization techniques like quantization.
- Strong familiarity with PyTorch, TensorRT, or TensorRT-LLM.
✨ Nice to Have
- Proficiency in enhancing performance of LLM software systems.
- Experience with CUDA or similar technologies.
- Experience with Docker and Kubernetes.
🎁 Benefits & Perks
- 📅 Flexible PTO including company-wide Winter Break.
- 🏥 100% coverage of medical, dental, and vision insurance for employee and dependents.
- 👶 Paid parental leave and fertility stipend through Carrot.
- 💰 Competitive compensation including meaningful equity.
0 0 0