1d ago
Infrastructure Engineer
New York, New York, United States; Remote; San Francisco, California, United States; Seattle, Washington, United States
$180k-$200k / year
full-timesenior Remoteai-ml
🛠 Tech Stack
💼 About This Role
You'll own image management, system diagnostics, and validation across large-scale GPU-enabled bare-metal infrastructure. You'll develop automation and improve reliability for AI/ML and HPC workloads. This role is critical to ensuring performant infrastructure for demanding AI workloads.
🎯 What You'll Do
- Evolve image management and deployment across bare-metal infrastructure
- Run test clusters for system validation and bring-up
- Diagnose GPU and hardware issues across layers
- Build Python automation for provisioning and validation
📋 Requirements
- 5+ years infrastructure or systems engineering experience
- Strong Linux systems experience in production
- Hands-on GPU diagnostics with tools like NVIDIA DCGM
- Proficiency in Python for automation
✨ Nice to Have
- Experience with InfiniBand or NVLink
- Experience with PXE boot or image-based provisioning
- Experience with iDRAC, IPMI, or Redfish
🎁 Benefits & Perks
- 🏥 Comprehensive medical, dental, vision coverage
- 💰 Retirement and financial wellness support
- 🏖️ Generous paid time off plus holidays
📨 Hiring Process
Estimated timeline: 2-4 weeks · AI estimate
- 1Recruiter Screen· 30 min
- 2Technical Interview· 60 min
- 3Hiring Manager Interview· 45 min
0 0 0