1d ago
Staff Software Engineer - AI Research Infrastructure
Mountain View, California; New York City, New York; San Francisco, California
$190k-$270k / year
full-timeseniorsoftware
๐ Tech Stack
๐ผ About This Role
You'll design and build large-scale experiment infrastructure for Databricks AI Research, working on distributed training and inference across thousands of GPUs. Your work enables researchers to iterate quickly and brings novel AI from prototype to production.
๐ฏ What You'll Do
- Design and implement infrastructure for large-scale experiments and model training.
- Build job submission, scheduling, and monitoring abstractions for researchers.
- Create tooling to improve research developer productivity and reduce iteration time.
- Influence the long-term roadmap for research computation at Databricks.
๐ Requirements
- BS/MS/PhD in Computer Science or related field
- 5+ years of software engineering experience with distributed systems or infrastructure.
- Deep experience with distributed systems, data pipelines, or large-scale backend services involving GPUs or cloud providers.
- Proficiency in systems programming languages (C++, Rust, Go, Java, Scala) to design complex services.
โจ Nice to Have
- Built or contributed to cluster schedulers (Kubernetes, Slurm, Ray, etc.).
- Understanding of ML training and inference workflows (distributed training, model parallelism, fine-tuning).
- Experience driving complex systems from prototype to stable service.
๐ Benefits & Perks
- ๐ฐ Equity and annual performance bonus included in total compensation.
- ๐๏ธ Comprehensive benefits and perks tailored to location.
- ๐ Work on cutting-edge AI research with world-class scientists and engineers.
- ๐ Inclusive culture with commitment to diversity and equal opportunity.
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Screenยท 30 min
- 2Technical Phone Screenยท 60 min
- 3On-site Interviews (4-5 rounds)ยท 4 hours
0 0 0