18h ago

Senior Staff Engineer, Cloud Site Operations

San Francisco, CA - US

$179k-$218k / year

full-timeleadai-ml

🛠 Tech Stack

💼 About This Role

You'll be the technical architect and strategic partner to the Director of Data Center Operations, ensuring our AI fleet is the most reliable and maintainable in the world. You'll bridge hardware engineering and ground-level execution, focusing on operational maturity and technical governance for our global white space. This role offers a chance to shape the infrastructure powering the AI revolution.

🎯 What You'll Do

  • Oversee technical health of global ticket queue and develop real-time dashboards
  • Partner with Fleet Engineering to define software access and diagnostic tooling
  • Lead end-to-end power topology mapping and build vs. buy analysis
  • Architect business continuity and disaster recovery frameworks for AI Cloud

📋 Requirements

  • 10+ years in Data Center Operations, Systems Engineering, or HPC hardware
  • Expert-level understanding of x86/GPU server architecture and electrical distribution
  • Proven experience in hardware maintenance at scale with high-density AI infrastructure
  • Expert proficiency in defining operational KPIs and building dashboards (e.g., Tableau, Grafana)

✨ Nice to Have

  • Familiarity with NVIDIA H200 and Blackwell (GB200) systems
  • Experience performing Build vs. Buy analyses for infrastructure tools
  • Mentorship experience with senior technicians and site leads

🎁 Benefits & Perks

  • 💰 Restricted Stock Units included in all offers
  • 🏖️ Paid time off & paid holidays
  • 🏥 Comprehensive health, dental & vision insurance with HSA employer contributions
  • 🍼 Paid parental leave
  • 📚 Professional development & tuition reimbursement
0 0 0