6h ago
Software Engineer, Workload Enablement
San Francisco
$293k-$455k / year
full-timesenior Hybridai-ml
🛠 Tech Stack
💼 About This Role
You'll join the Scaling team to enable production workloads and end-to-end testing on new AI platforms. You'll port inference and training workloads, build benchmarks, and deep-dive into performance bottlenecks for distributed training at scale. This role offers direct impact on the infrastructure behind cutting-edge AI models.
🎯 What You'll Do
- Port and validate inference/training workloads on new platforms
- Build benchmark suites capturing real end-to-end behavior
- Deep-dive performance on distributed training/inference
- Create test harnesses running in CI/lab environments
📋 Requirements
- BS in CS/EE or equivalent practical experience
- 5+ years in ML systems, performance engineering, distributed systems, or HPC
- Strong hands-on experience with PyTorch and modern LLM stacks
- Experience with RDMA and debugging/optimizing comms libraries (NCCL/RCCL)
✨ Nice to Have
- Experience building workload-shaped benchmarks and stress tests
- Familiarity with RDMA networking and transport tuning
- Experience running workloads in Kubernetes
🎁 Benefits & Perks
- 💰 Competitive equity included in total compensation
- 🏖️ Flexible time off
- 🏥 Comprehensive health insurance
0 0 0