12h ago
Senior AI Researcher - Pre-training Data
Heidelberg
โจ $120k-$180k / yearest.
full-timesenior Hybridai-ml
๐ Tech Stack
๐ผ About This Role
You'll shape the scientific methodology behind pre-training data for large language models, designing ablations and novel algorithms to improve model capabilities. You'll collaborate with engineers and researchers to build scalable data pipelines that directly impact shipped models.
๐ฏ What You'll Do
- Innovate in data-centric AI: identify and implement novel approaches to data quality and curriculum learning.
- Design and lead rigorous ablation studies across various scales to analyze data composition effects.
- Develop advanced algorithms for scoring and selecting data, such as influence functions or gradient-based matching.
- Collaborate cross-functionally to scale research from prototypes to trillion-token pipelines.
๐ Requirements
- Deep understanding of machine learning theory for foundation model training dynamics and scaling laws.
- Experience designing complex ML experiments related to data composition or curriculum learning.
- Strong Python skills and comfort with PyTorch and deep learning frameworks.
- Willingness to relocate to Heidelberg or travel fortnightly.
โจ Nice to Have
- PhD in machine learning, NLP, or equivalent research experience.
- Contributions to top-tier venues like NeurIPS, ICML, ICLR, or ACL on data curation or LLM pre-training.
- Experience training foundation models from scratch and diagnosing data-induced pathologies.
๐ Benefits & Perks
- ๐๏ธ 30 days paid vacation
- ๐๏ธ Wellhub fitness & wellness membership
- ๐ง nilo.health mental health support
- ๐ฐ Substantially subsidized company pension plan
- ๐ Subsidized Germany-wide transportation ticket
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Callยท 30 min
- 2Technical Interviewยท 60 min
- 3On-site or Virtual Final Roundยท 90 min
0 0 0