Software Engineer - Model Performance at Baseten — CareerPair

19h ago

Software Engineer - Model Performance

San Francisco

$180k-$360k / year

full-time Hybridai-ml

🛠 Tech Stack

💼 About This Role

You'll join the Model Performance team to implement cutting-edge techniques for ML model inference. You will optimize large language models and deep dive into codebases like TensorRT and PyTorch to debug performance issues.

🎯 What You'll Do

Implement quantization, speculative decoding, and other inference techniques.
Debug ML performance issues in TensorRT, PyTorch, and CUDA codebases.
Apply optimization techniques across a range of ML models.
Own projects from idea to production.

📋 Requirements

Bachelor's or higher in Computer Science, Engineering, or related field.
Experience with Python or C++.
Familiarity with LLM optimization techniques like quantization.
Strong familiarity with PyTorch, TensorRT, or TensorRT-LLM.

✨ Nice to Have

Proficiency in enhancing performance of LLM software systems.
Experience with CUDA or similar technologies.
Experience with Docker and Kubernetes.

🎁 Benefits & Perks

📅 Flexible PTO including company-wide Winter Break.
🏥 100% coverage of medical, dental, and vision insurance for employee and dependents.
👶 Paid parental leave and fertility stipend through Carrot.
💰 Competitive compensation including meaningful equity.

Baseten

Baseten Jobs

Other jobs at Baseten

No other jobs found.

0 0 0