10h ago

Site Reliability Engineer

Santa Clara, CA

$135.2k-$176.8k / year

contractmidsoftware

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll work in NVIDIA's IPP group to build and stabilize virtualization infrastructure for a private cloud supporting thousands of engineers. You'll ensure fleet reliability and automate deployments across a heterogeneous mix of GPUs and platforms.

๐ŸŽฏ What You'll Do

  • Monitor and recover assets in private cloud environment with NVIDIA GPUs.
  • Build and stabilize virtualization infrastructure (ESXi, KVM, Hyper-V).
  • Deploy and maintain large farm of machines using Chef, Ansible, Terraform.
  • Participate in on-call L1 support for 24/7 monitoring and remediation.

๐Ÿ“‹ Requirements

  • 5+ years professional experience in large scale enterprise production systems.
  • Bachelor's or Master's in CS or equivalent experience.
  • Scripting experience with Python or Go and Unix shell proficiency.
  • Experience with version control systems like Perforce or GIT.

โœจ Nice to Have

  • Experience with VM and hardware virtualization (VMware, KVM, Hyper-V, Docker, Kubernetes).
  • Background with automating bare metal and VM provisioning.
  • Development experience in Chef, Ansible, and infrastructure orchestration.

๐ŸŽ Benefits & Perks

  • ๐Ÿ–๏ธ PTO
  • ๐Ÿ’ฐ Competitive pay ($65/hr - $85/hr)
  • ๐Ÿฅ Full benefits
  • ๐Ÿข Amazing company culture

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks ยท AI estimate

  1. 1Recruiter phone screenยท 30 min
  2. 2Technical interviewยท 60 min
  3. 3Hiring manager interviewยท 45 min
0 0 0