14h ago
Staff Software Engineer - AI Research Infrastructure
New York City, New York | San Francisco, California
$199k-$270k / year
full-timeseniorsoftware
๐ Tech Stack
๐ผ About This Role
You'll design and build infrastructure to power large-scale AI experiments across thousands of GPUs at Databricks AI Research. You'll partner with research scientists to turn experimental workloads into robust pipelines and push the limits of what our infrastructure can support.
๐ฏ What You'll Do
- Design and implement infrastructure for large-scale experiments and model training
- Build job submission, scheduling, and monitoring abstractions
- Create tooling for experiment management and CI/testing for research code
- Influence long-term roadmap for research computation
๐ Requirements
- BS/MS or PhD in Computer Science or related field
- 5+ years of software engineering experience including large-scale distributed systems
- Deep experience with distributed systems and infrastructure (GPUs, clusters, cloud)
- Proficient in systems languages (C++, Rust, Go, Java, Scala)
โจ Nice to Have
- Experience with cluster schedulers or job orchestration (Kubernetes, Slurm, Ray)
- Understanding of modern ML training and inference workflows
- Experience driving complex systems from prototype to stable service
๐ Benefits & Perks
- ๐ Annual Performance Bonus
- ๐ Equity
- ๐๏ธ Comprehensive Benefits per region
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Screenยท 30 min
- 2Technical Phone Screenยท 60 min
- 3Onsite Interviewsยท 4 hours
0 0 0