10h ago
Site Reliability Engineer
Santa Clara, CA
$135.2k-$176.8k / year
contractmidsoftware
๐ Tech Stack
๐ผ About This Role
You'll work in NVIDIA's IPP group to build and stabilize virtualization infrastructure for a private cloud supporting thousands of engineers. You'll ensure fleet reliability and automate deployments across a heterogeneous mix of GPUs and platforms.
๐ฏ What You'll Do
- Monitor and recover assets in private cloud environment with NVIDIA GPUs.
- Build and stabilize virtualization infrastructure (ESXi, KVM, Hyper-V).
- Deploy and maintain large farm of machines using Chef, Ansible, Terraform.
- Participate in on-call L1 support for 24/7 monitoring and remediation.
๐ Requirements
- 5+ years professional experience in large scale enterprise production systems.
- Bachelor's or Master's in CS or equivalent experience.
- Scripting experience with Python or Go and Unix shell proficiency.
- Experience with version control systems like Perforce or GIT.
โจ Nice to Have
- Experience with VM and hardware virtualization (VMware, KVM, Hyper-V, Docker, Kubernetes).
- Background with automating bare metal and VM provisioning.
- Development experience in Chef, Ansible, and infrastructure orchestration.
๐ Benefits & Perks
- ๐๏ธ PTO
- ๐ฐ Competitive pay ($65/hr - $85/hr)
- ๐ฅ Full benefits
- ๐ข Amazing company culture
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter phone screenยท 30 min
- 2Technical interviewยท 60 min
- 3Hiring manager interviewยท 45 min
0 0 0