8h ago

Member of Technical Staff, Training

Bay Area

$200k-$350k / yearest.

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack. Your work will span from data pipelines to GPU kernels, optimizing for scalability and efficiency.

🎯 What You'll Do

  • Profile and eliminate bottlenecks in model training stack
  • Design and optimize distributed training systems for GPU clusters
  • Implement efficient low-level code (CUDA, Triton, custom kernels)
  • Develop monitoring and debugging tools for large-scale runs

📋 Requirements

  • 8+ years experience in distributed systems or HPC
  • Production-grade Python expertise
  • Low-level performance mastery with CUDA/cuDNN/Triton
  • Experience with PyTorch and model parallelism
0 0 0