5h ago

Member of Technical Staff - ML Performance

New York

$150k-$350k / year

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll join a fast-growing AI infrastructure company working on GPU performance optimization for production ML workloads. You'll make diffusion and language models achieve higher throughput and lower latency by improving Modal's container runtime. This role offers the chance to contribute to open-source projects and work with top engineers.

🎯 What You'll Do

  • Optimize ML inference pipelines for throughput and latency
  • Debug and improve GPU utilization (SM occupancy, memory bandwidth)
  • Contribute to open-source inference engines like vLLM or TensorRT
  • Participate in on-call rotation for production incidents

📋 Requirements

  • 5+ years of experience writing high-performance code
  • Experience with PyTorch, vLLM, or TensorRT
  • Familiarity with Nvidia GPU architecture and CUDA
  • Proven track record in ML performance engineering

✨ Nice to Have

  • Familiarity with low-level OS foundations (Linux kernel, file systems, containers)
  • Experience with open-source contributions

🎁 Benefits & Perks

  • 💰 Competitive salary with equity
  • 🏢 Office in NYC, SF, or Stockholm
  • 🚀 Fast-growing team with career growth opportunities
0 0 0