10h ago

Senior Software Engineer - Model Performance

San Francisco

$220k-$320k / year

full-timemidai-ml

🛠 Tech Stack

💼 About This Role

You'll make our inference stack as fast and efficient as possible, working from CUDA kernels to serving frameworks to eliminate bottlenecks. Your north star is inference performance: latency, throughput, and cost efficiency. You'll have autonomy, a large compute budget, and technical support to push limits.

🎯 What You'll Do

  • Implement optimization techniques like quantization and speculative decoding
  • Deep dive into inference frameworks to debug and improve performance
  • Profile and optimize CUDA kernels and GPU utilization
  • Build tooling and benchmarks to track inference performance

📋 Requirements

  • 2+ years of experience in ML systems or inference optimization
  • Strong proficiency in Python and familiarity with C++
  • Hands-on experience with LLM inference frameworks (vLLM, SGLang, TensorRT-LLM)
  • Deep understanding of GPU architecture and profiling GPU workloads

✨ Nice to Have

  • Experience with CUDA programming
  • Familiarity with serving non-LLM models (TTS, vision, embeddings)
  • Contributions to open-source inference frameworks

🎁 Benefits & Perks

  • 💰 Competitive base salary $220,000 - $320,000
  • 📈 Equity in high-growth startup
  • 🏥 Comprehensive benefits
  • 🏢 In-person collaboration in downtown SF

📨 Hiring Process

[email protected]

0 0 0