21h ago

Infrastructure Engineer, Lab Manager

San Francisco, CA

$237.6k-$288k / year

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll lead a team managing a GPU research lab with cutting-edge NVIDIA and AMD systems. You'll diagnose and repair high-performance compute clusters, supporting new product integration. This role offers the chance to work at the forefront of AI infrastructure.

🎯 What You'll Do

  • Manage a team of two infrastructure engineers and one network engineer.
  • Diagnose and repair hardware faults within GPU racks.
  • Execute component-level diagnosis and remediation for failed hardware.
  • Maintain documentation of maintenance activities in ticketing systems.

📋 Requirements

  • Leadership experience managing high-caliber engineers
  • Diagnosis of high-density rack-mounted compute hardware
  • GPU platform support (NVIDIA A100, H200, GB200, B200, AMD 350X/355X)
  • Linux command line proficiency (Ubuntu, Rocky Linux, CentOS)

✨ Nice to Have

  • Technical certification or degree in EE/CS or related field
  • Experience working directly with hardware vendors
  • Background in large-scale GPU fleet operations

🎁 Benefits & Perks

  • 💰 Competitive pay
  • 📈 Restricted Stock Units
  • 🏥 Health insurance with HDHP and PPO options
  • 👶 Paid Parental Leave
  • 🏖️ Generous PTO
0 0 0