21h ago
Infrastructure Engineer, Lab Manager
San Francisco, CA
$237.6k-$288k / year
full-timeseniorai-ml
🛠 Tech Stack
💼 About This Role
You'll lead a team managing a GPU research lab with cutting-edge NVIDIA and AMD systems. You'll diagnose and repair high-performance compute clusters, supporting new product integration. This role offers the chance to work at the forefront of AI infrastructure.
🎯 What You'll Do
- Manage a team of two infrastructure engineers and one network engineer.
- Diagnose and repair hardware faults within GPU racks.
- Execute component-level diagnosis and remediation for failed hardware.
- Maintain documentation of maintenance activities in ticketing systems.
📋 Requirements
- Leadership experience managing high-caliber engineers
- Diagnosis of high-density rack-mounted compute hardware
- GPU platform support (NVIDIA A100, H200, GB200, B200, AMD 350X/355X)
- Linux command line proficiency (Ubuntu, Rocky Linux, CentOS)
✨ Nice to Have
- Technical certification or degree in EE/CS or related field
- Experience working directly with hardware vendors
- Background in large-scale GPU fleet operations
🎁 Benefits & Perks
- 💰 Competitive pay
- 📈 Restricted Stock Units
- 🏥 Health insurance with HDHP and PPO options
- 👶 Paid Parental Leave
- 🏖️ Generous PTO
0 0 0