3h ago
Member of Technical Staff - Compute Infrastructure
Palo Alto, CA
$180k-$440k / year
full-timeArtificial Intelligence
🛠 Tech Stack
💼 About This Role
You'll design and build massive-scale compute clusters and orchestration platforms for frontier AI training and inference at xAI. Your work will directly impact the reliability and performance of next-generation AI systems. This role offers the chance to push beyond existing technologies like Kubernetes in a fast-paced, flat organization.
🎯 What You'll Do
- Build and manage massive-scale clusters for AI workloads.
- Design and extend an in-house container orchestration platform.
- Collaborate with research teams to optimize compute clusters.
- Profile, debug, and resolve system-level performance bottlenecks.
📋 Requirements
- Deep expertise in virtualization technologies (KVM, Xen, QEMU).
- Strong proficiency in C/C++ and Rust.
- Proven track record profiling and optimizing complex system-level performance issues.
- Hands-on experience building or enhancing distributed compute platforms at scale.
✨ Nice to Have
- Experience in Linux kernel development or hypervisor extensions.
- Track record operating large-scale AI training/inference clusters (GPU/TPU).
- Familiarity with performance tools and tracing in production environments.
🎁 Benefits & Perks
- 💰 Equity included in total rewards.
- 🏥 Comprehensive medical, vision, and dental coverage.
- 🏦 401(k) retirement plan.
- 🛡️ Short and long-term disability insurance.
- 📈 Life insurance and various discounts.
0 0 0