4h ago

Member of Technical Staff (Infrastructure): World Models

Remote

$150k-$250k / yearest.

full-timemid Remoteai-ml

🛠 Tech Stack

💼 About This Role

You'll design and scale GPU infrastructure for AI training and inference. Your core impact is enabling world model research through reliable compute. This role offers deep ownership of high-performance clusters and collaboration with top AI researchers.

🎯 What You'll Do

  • Design scheduling for inference and training on shared GPU clusters
  • Build and operate GPU infrastructure across thousands of GPUs
  • Own GPU utilization and cost as key metrics
  • Participate in on-call rotation for reliability

📋 Requirements

  • Linux-native systems programming with kernel-level debugging
  • Kubernetes and Slurm cluster engineering for 1000+ GPUs
  • Distributed systems design and operation at scale
  • Production monitoring and incident response experience

✨ Nice to Have

  • ML model training and inference workload understanding
  • Resource-constrained scheduling experience (HPC, trading)

🎁 Benefits & Perks

  • 💰 Competitive salary and equity
  • 🏥 Private health coverage
  • 🏖️ Fully-distributed, async-first culture
  • 💻 Hardware setup of your choice
  • 📱 Stipends for phone, internet, and meals

🚩 Heads Up

  • Emphasizes 'Olympic athlete' dedication with late nights and weekends
  • Mentions hybrid but infers fully remote with travel expectations
0 0 0