8h ago
Member of Technical Staff, Training
Bay Area
✨ $200k-$350k / yearest.
full-timeseniorai-ml
🛠 Tech Stack
💼 About This Role
You'll drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack. Your work will span from data pipelines to GPU kernels, optimizing for scalability and efficiency.
🎯 What You'll Do
- Profile and eliminate bottlenecks in model training stack
- Design and optimize distributed training systems for GPU clusters
- Implement efficient low-level code (CUDA, Triton, custom kernels)
- Develop monitoring and debugging tools for large-scale runs
📋 Requirements
- 8+ years experience in distributed systems or HPC
- Production-grade Python expertise
- Low-level performance mastery with CUDA/cuDNN/Triton
- Experience with PyTorch and model parallelism
0 0 0