6h ago

Software Engineer, Workload Enablement

San Francisco

$293k-$455k / year

full-timesenior Hybridai-ml

🛠 Tech Stack

💼 About This Role

You'll join the Scaling team to enable production workloads and end-to-end testing on new AI platforms. You'll port inference and training workloads, build benchmarks, and deep-dive into performance bottlenecks for distributed training at scale. This role offers direct impact on the infrastructure behind cutting-edge AI models.

🎯 What You'll Do

  • Port and validate inference/training workloads on new platforms
  • Build benchmark suites capturing real end-to-end behavior
  • Deep-dive performance on distributed training/inference
  • Create test harnesses running in CI/lab environments

📋 Requirements

  • BS in CS/EE or equivalent practical experience
  • 5+ years in ML systems, performance engineering, distributed systems, or HPC
  • Strong hands-on experience with PyTorch and modern LLM stacks
  • Experience with RDMA and debugging/optimizing comms libraries (NCCL/RCCL)

✨ Nice to Have

  • Experience building workload-shaped benchmarks and stress tests
  • Familiarity with RDMA networking and transport tuning
  • Experience running workloads in Kubernetes

🎁 Benefits & Perks

  • 💰 Competitive equity included in total compensation
  • 🏖️ Flexible time off
  • 🏥 Comprehensive health insurance
0 0 0