1d ago

Infrastructure Engineer

New York, New York, United States; Remote; San Francisco, California, United States; Seattle, Washington, United States

$180k-$200k / year

full-timesenior Remoteai-ml

🛠 Tech Stack

💼 About This Role

You'll own image management, system diagnostics, and validation across large-scale GPU-enabled bare-metal infrastructure. You'll develop automation and improve reliability for AI/ML and HPC workloads. This role is critical to ensuring performant infrastructure for demanding AI workloads.

🎯 What You'll Do

  • Evolve image management and deployment across bare-metal infrastructure
  • Run test clusters for system validation and bring-up
  • Diagnose GPU and hardware issues across layers
  • Build Python automation for provisioning and validation

📋 Requirements

  • 5+ years infrastructure or systems engineering experience
  • Strong Linux systems experience in production
  • Hands-on GPU diagnostics with tools like NVIDIA DCGM
  • Proficiency in Python for automation

✨ Nice to Have

  • Experience with InfiniBand or NVLink
  • Experience with PXE boot or image-based provisioning
  • Experience with iDRAC, IPMI, or Redfish

🎁 Benefits & Perks

  • 🏥 Comprehensive medical, dental, vision coverage
  • 💰 Retirement and financial wellness support
  • 🏖️ Generous paid time off plus holidays

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

  1. 1Recruiter Screen· 30 min
  2. 2Technical Interview· 60 min
  3. 3Hiring Manager Interview· 45 min
0 0 0