3h ago

Member of Technical Staff - Compute Infrastructure

Palo Alto, CA

$180k-$440k / year

full-timeArtificial Intelligence

🛠 Tech Stack

💼 About This Role

You'll design and build massive-scale compute clusters and orchestration platforms for frontier AI training and inference at xAI. Your work will directly impact the reliability and performance of next-generation AI systems. This role offers the chance to push beyond existing technologies like Kubernetes in a fast-paced, flat organization.

🎯 What You'll Do

  • Build and manage massive-scale clusters for AI workloads.
  • Design and extend an in-house container orchestration platform.
  • Collaborate with research teams to optimize compute clusters.
  • Profile, debug, and resolve system-level performance bottlenecks.

📋 Requirements

  • Deep expertise in virtualization technologies (KVM, Xen, QEMU).
  • Strong proficiency in C/C++ and Rust.
  • Proven track record profiling and optimizing complex system-level performance issues.
  • Hands-on experience building or enhancing distributed compute platforms at scale.

✨ Nice to Have

  • Experience in Linux kernel development or hypervisor extensions.
  • Track record operating large-scale AI training/inference clusters (GPU/TPU).
  • Familiarity with performance tools and tracing in production environments.

🎁 Benefits & Perks

  • 💰 Equity included in total rewards.
  • 🏥 Comprehensive medical, vision, and dental coverage.
  • 🏦 401(k) retirement plan.
  • 🛡️ Short and long-term disability insurance.
  • 📈 Life insurance and various discounts.
0 0 0