Member of Technical Staff - ML Performance at Modal — CareerPair

5h ago

Member of Technical Staff - ML Performance

New York

$150k-$350k / year

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll join a fast-growing AI infrastructure company working on GPU performance optimization for production ML workloads. You'll make diffusion and language models achieve higher throughput and lower latency by improving Modal's container runtime. This role offers the chance to contribute to open-source projects and work with top engineers.

🎯 What You'll Do

Optimize ML inference pipelines for throughput and latency
Debug and improve GPU utilization (SM occupancy, memory bandwidth)
Contribute to open-source inference engines like vLLM or TensorRT
Participate in on-call rotation for production incidents

📋 Requirements

5+ years of experience writing high-performance code
Experience with PyTorch, vLLM, or TensorRT
Familiarity with Nvidia GPU architecture and CUDA
Proven track record in ML performance engineering

✨ Nice to Have

Familiarity with low-level OS foundations (Linux kernel, file systems, containers)
Experience with open-source contributions

🎁 Benefits & Perks

💰 Competitive salary with equity
🏢 Office in NYC, SF, or Stockholm
🚀 Fast-growing team with career growth opportunities

Modal

Modal Jobs

Other jobs at Modal

No other jobs found.

0 0 0