9h ago

Member of Technical Staff, Inference

Bay Area

$275k-$350k / yearest.

full-timesenior Remote

🛠 Tech Stack

💼 About This Role

You'll build low-latency inference pipelines for on-device deployment, enabling real-time control loops in robotics. You'll design distributed inference systems on GPU clusters and push throughput with efficient resource utilization. This role combines low-level CUDA and Triton with high-level framework integration.

🎯 What You'll Do

  • Build low-latency inference pipelines for on-device deployment
  • Design and optimize distributed inference systems on GPU clusters
  • Implement efficient low-level code (CUDA, Triton, custom kernels)
  • Develop monitoring and debugging tools for reliability and determinism

📋 Requirements

  • 8+ years of experience in distributed systems or ML infrastructure
  • Production-grade expertise in Python and systems languages (C++/Rust/Go)
  • Low-level performance mastery in CUDA and Triton
  • Proven track record scaling inference workloads in cluster and on-device environments

✨ Nice to Have

  • Experience with quantization and memory scheduling
  • Knowledge of graph compilation techniques
  • Background in robotics or real-time systems

🎁 Benefits & Perks

  • 💰 Competitive Equity
  • 🏖️ Unlimited PTO
  • 🏥 Health Insurance
0 0 0