12h ago

Senior AI Researcher - Pre-training Data

Heidelberg

โœจ $120k-$180k / yearest.

full-timesenior Hybridai-ml

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll shape the scientific methodology behind pre-training data for large language models, designing ablations and novel algorithms to improve model capabilities. You'll collaborate with engineers and researchers to build scalable data pipelines that directly impact shipped models.

๐ŸŽฏ What You'll Do

  • Innovate in data-centric AI: identify and implement novel approaches to data quality and curriculum learning.
  • Design and lead rigorous ablation studies across various scales to analyze data composition effects.
  • Develop advanced algorithms for scoring and selecting data, such as influence functions or gradient-based matching.
  • Collaborate cross-functionally to scale research from prototypes to trillion-token pipelines.

๐Ÿ“‹ Requirements

  • Deep understanding of machine learning theory for foundation model training dynamics and scaling laws.
  • Experience designing complex ML experiments related to data composition or curriculum learning.
  • Strong Python skills and comfort with PyTorch and deep learning frameworks.
  • Willingness to relocate to Heidelberg or travel fortnightly.

โœจ Nice to Have

  • PhD in machine learning, NLP, or equivalent research experience.
  • Contributions to top-tier venues like NeurIPS, ICML, ICLR, or ACL on data curation or LLM pre-training.
  • Experience training foundation models from scratch and diagnosing data-induced pathologies.

๐ŸŽ Benefits & Perks

  • ๐Ÿ–๏ธ 30 days paid vacation
  • ๐Ÿ‹๏ธ Wellhub fitness & wellness membership
  • ๐Ÿง  nilo.health mental health support
  • ๐Ÿ’ฐ Substantially subsidized company pension plan
  • ๐Ÿš† Subsidized Germany-wide transportation ticket

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks ยท AI estimate

  1. 1Recruiter Callยท 30 min
  2. 2Technical Interviewยท 60 min
  3. 3On-site or Virtual Final Roundยท 90 min
0 0 0