2d ago

Machine Learning Engineer, Core Data

Remote

โœจ $180k-$260k / yearest.

full-timemid Remoteai-ml

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll own the datasets that power our speech systems, auditing, cleaning, and building tooling for TTS training corpora. You'll develop data quality metrics and classifiers, directly improving model performance and robustness. This role drives the data flywheel for a social AI company pushing creative boundaries.

๐ŸŽฏ What You'll Do

  • Define specs and curate large-scale audio/text datasets
  • Build automated quality gates with dashboards
  • Train lightweight classifiers for data filtering
  • Optimize data mixtures via sampling and active learning

๐Ÿ“‹ Requirements

  • Experience building ML-driven data quality systems for audio/speech
  • Proficient in Python and PyTorch
  • Audio/speech fundamentals: torchaudio, spectrogram features, VAD
  • Scalable data engineering: Spark/Beam, SQL, Airflow

โœจ Nice to Have

  • Shipped datasets or tooling that improved TTS/ASR
  • Built classifiers for LID, speaker verification, or noise detection
  • Ran crowdsourcing annotation with quality control

๐ŸŽ Benefits & Perks

  • ๐Ÿ  Remote-first culture
  • ๐Ÿš€ Work on cutting-edge AI
  • ๐Ÿ’ก Influence core data strategy
  • ๐Ÿ“ˆ Career growth in fast-moving startup

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks ยท AI estimate

  1. 1Recruiter Screenยท 30 min
  2. 2Technical Interviewยท 60 min
  3. 3Take-home Assignmentยท 3-4 hours
  4. 4Onsite (Virtual) Finalยท 3 hours
0 0 0