1d ago

Staff Infrastructure Engineer, Cluster Infrastructure

San Francisco, CA | New York City, NY | Seattle, WA

$320k-$405k / year

full-timelead Hybridai-ml Visa Sponsor

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll own the technical strategy for agent-driven cluster lifecycle management, provisioning high-bandwidth, secure-by-default compute clusters across cloud providers and datacenters. Your work directly enables scaling Claude to millions of users and accelerating AI safety research at a company growing faster than nearly any other.

๐ŸŽฏ What You'll Do

  • Own technical strategy for agent-driven cluster lifecycle management
  • Partner across teams to ingest new compute capacity on time
  • Collaborate on physical build-out and high-bandwidth inter-cluster connectivity
  • Drive strategy on cluster scalability, homogeneity, and fault tolerance

๐Ÿ“‹ Requirements

  • Deep expertise in distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP/Azure)
  • Strong proficiency in at least one systems language (Rust, Go, or Python) and IaC with Terraform
  • Track record of leading complex, multi-quarter technical initiatives spanning multiple teams or systems
  • Ability to build alignment across senior stakeholders and communicate effectively at all levels

โœจ Nice to Have

  • 8+ years of software engineering experience including technical lead role
  • Experience operating large-scale compute infrastructure at hyperscale (100+ clusters, 10K+ nodes)
  • Depth in Kubernetes internals, cluster provisioning, or orchestration systems

๐ŸŽ Benefits & Perks

  • ๐Ÿ–๏ธ Unlimited PTO
  • ๐Ÿฅ Comprehensive health insurance
  • ๐Ÿ’ฐ Annual salary $320k-$405k
  • ๐Ÿš€ Equity packages
  • ๐Ÿ”„ Visa sponsorship available

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks ยท AI estimate

  1. 1Recruiter screenยท 30 min
  2. 2Technical phone interviewยท 60 min
  3. 3Onsite interviews (3-4 rounds)ยท 4 hours
0 0 0