1d ago
Staff Software Engineer, Node Infra
San Francisco, CA
$320k-$405k / year
full-timelead Hybridai-ml Visa Sponsor
๐ Tech Stack
๐ผ About This Role
You'll own the technical strategy for node lifecycle management at a leading AI company, ensuring reliable infrastructure for frontier AI research. Your work will directly impact the scalability and uptime of clusters powering millions of users. This role offers deep technical ownership and intersection of systems and ML.
๐ฏ What You'll Do
- Own technical strategy for node lifecycle management
- Design and operate automated hardware health detection and remediation
- Drive cross-team initiatives to scale AI clusters across cloud providers
- Set infrastructure architecture and establish operational excellence practices
๐ Requirements
- Deep expertise in distributed systems and cloud platforms (Kubernetes, AWS/GCP/Azure)
- Strong proficiency in a systems language like Rust, Go, or Python
- Hands-on experience with ML accelerators (GPUs, TPUs, Trainium)
- Track record leading complex, multi-quarter technical initiatives
โจ Nice to Have
- 8+ years of software engineering experience as a technical lead
- Experience managing 10K+ node infrastructure
- Low-level systems experience: kernel, virtualization, device drivers
๐ Benefits & Perks
- ๐๏ธ Flexible PTO
- ๐ฅ Comprehensive health insurance
- ๐ฐ Competitive equity package
- ๐ Learning and development budget
- ๐ฝ๏ธ Daily meals and snacks
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Screenยท 30 min
- 2Technical Phone Screenยท 45 min
- 3On-site Interviewsยท 4 hours
0 0 0