2h ago

Member of Technical Staff - Data Quality Engineer (Pre-training)

San Francisco

$200k-$350k / yearest.

full-timeai-ml

🛠 Tech Stack

💼 About This Role

You'll own data quality for LLM pre-training at an AI research company building open superintelligence. You'll design automated quality checks and collaborate with researchers to turn data quality insights into measurable standards that impact model performance.

🎯 What You'll Do

  • Own upstream data quality for LLM pre-training across languages and modalities
  • Partner with research teams to translate requirements into measurable quality signals
  • Design and validate automated QA methods for large-scale data campaigns
  • Build reusable QA pipelines delivering high-quality data to pre-training teams

📋 Requirements

  • Strong engineering fundamentals building data pipelines or QA systems
  • Proficiency in Python and building ML/LLM workflows
  • Experience with large datasets and automated evaluation systems
  • Ability to translate quality concerns into concrete signals and feedback

✨ Nice to Have

  • Experience with LLM-as-a-Judge or model-assisted quality checks
  • Familiarity with how LLMs are trained and evaluated
  • Excellent communication across teams

🎁 Benefits & Perks

  • 💰 Top-tier compensation with salary and equity
  • 🏥 Comprehensive health & wellness coverage
  • 👶 Fully paid parental leave and family planning support
  • 🏖️ Paid time off and relocation support
  • 🍽️ Daily lunch and dinner plus team off-sites
0 0 0