19h ago

Software Engineer - Model Performance

San Francisco

$180k-$360k / year

full-time Hybridai-ml

🛠 Tech Stack

💼 About This Role

You'll join the Model Performance team to implement cutting-edge techniques for ML model inference. You will optimize large language models and deep dive into codebases like TensorRT and PyTorch to debug performance issues.

🎯 What You'll Do

  • Implement quantization, speculative decoding, and other inference techniques.
  • Debug ML performance issues in TensorRT, PyTorch, and CUDA codebases.
  • Apply optimization techniques across a range of ML models.
  • Own projects from idea to production.

📋 Requirements

  • Bachelor's or higher in Computer Science, Engineering, or related field.
  • Experience with Python or C++.
  • Familiarity with LLM optimization techniques like quantization.
  • Strong familiarity with PyTorch, TensorRT, or TensorRT-LLM.

✨ Nice to Have

  • Proficiency in enhancing performance of LLM software systems.
  • Experience with CUDA or similar technologies.
  • Experience with Docker and Kubernetes.

🎁 Benefits & Perks

  • 📅 Flexible PTO including company-wide Winter Break.
  • 🏥 100% coverage of medical, dental, and vision insurance for employee and dependents.
  • 👶 Paid parental leave and fertility stipend through Carrot.
  • 💰 Competitive compensation including meaningful equity.
0 0 0