Senior Software Engineer - Model Performance at Inference — CareerPair

10h ago

Senior Software Engineer - Model Performance

San Francisco

$220k-$320k / year

full-timemidai-ml

🛠 Tech Stack

💼 About This Role

You'll make our inference stack as fast and efficient as possible, working from CUDA kernels to serving frameworks to eliminate bottlenecks. Your north star is inference performance: latency, throughput, and cost efficiency. You'll have autonomy, a large compute budget, and technical support to push limits.

🎯 What You'll Do

Implement optimization techniques like quantization and speculative decoding
Deep dive into inference frameworks to debug and improve performance
Profile and optimize CUDA kernels and GPU utilization
Build tooling and benchmarks to track inference performance

📋 Requirements

2+ years of experience in ML systems or inference optimization
Strong proficiency in Python and familiarity with C++
Hands-on experience with LLM inference frameworks (vLLM, SGLang, TensorRT-LLM)
Deep understanding of GPU architecture and profiling GPU workloads

✨ Nice to Have

Experience with CUDA programming
Familiarity with serving non-LLM models (TTS, vision, embeddings)
Contributions to open-source inference frameworks

🎁 Benefits & Perks

💰 Competitive base salary $220,000 - $320,000
📈 Equity in high-growth startup
🏥 Comprehensive benefits
🏢 In-person collaboration in downtown SF

📨 Hiring Process

[email protected]

Inference

Inference Jobs

Other jobs at Inference

No other jobs found.

0 0 0