9h ago
Member of Technical Staff, Inference
Bay Area
✨ $275k-$350k / yearest.
full-timesenior Remote
🛠 Tech Stack
💼 About This Role
You'll build low-latency inference pipelines for on-device deployment, enabling real-time control loops in robotics. You'll design distributed inference systems on GPU clusters and push throughput with efficient resource utilization. This role combines low-level CUDA and Triton with high-level framework integration.
🎯 What You'll Do
- Build low-latency inference pipelines for on-device deployment
- Design and optimize distributed inference systems on GPU clusters
- Implement efficient low-level code (CUDA, Triton, custom kernels)
- Develop monitoring and debugging tools for reliability and determinism
📋 Requirements
- 8+ years of experience in distributed systems or ML infrastructure
- Production-grade expertise in Python and systems languages (C++/Rust/Go)
- Low-level performance mastery in CUDA and Triton
- Proven track record scaling inference workloads in cluster and on-device environments
✨ Nice to Have
- Experience with quantization and memory scheduling
- Knowledge of graph compilation techniques
- Background in robotics or real-time systems
🎁 Benefits & Perks
- 💰 Competitive Equity
- 🏖️ Unlimited PTO
- 🏥 Health Insurance
0 0 0