Site Reliability Engineer at page_title

10h ago

Site Reliability Engineer

Santa Clara, CA

$135.2k-$176.8k / year

contractmidsoftware

🛠 Tech Stack

💼 About This Role

You'll work in NVIDIA's IPP group to build and stabilize virtualization infrastructure for a private cloud supporting thousands of engineers. You'll ensure fleet reliability and automate deployments across a heterogeneous mix of GPUs and platforms.

🎯 What You'll Do

Monitor and recover assets in private cloud environment with NVIDIA GPUs.
Build and stabilize virtualization infrastructure (ESXi, KVM, Hyper-V).
Deploy and maintain large farm of machines using Chef, Ansible, Terraform.
Participate in on-call L1 support for 24/7 monitoring and remediation.

📋 Requirements

5+ years professional experience in large scale enterprise production systems.
Bachelor's or Master's in CS or equivalent experience.
Scripting experience with Python or Go and Unix shell proficiency.
Experience with version control systems like Perforce or GIT.

✨ Nice to Have

Experience with VM and hardware virtualization (VMware, KVM, Hyper-V, Docker, Kubernetes).
Background with automating bare metal and VM provisioning.
Development experience in Chef, Ansible, and infrastructure orchestration.

🎁 Benefits & Perks

🏖️ PTO
💰 Competitive pay ($65/hr - $85/hr)
🏥 Full benefits
🏢 Amazing company culture

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

1Recruiter phone screen· 30 min
2Technical interview· 60 min
3Hiring manager interview· 45 min

page_title

Other jobs at page_title

No other jobs found.

0 0 0