2d ago
Machine Learning Engineer, Core Data
Remote
โจ $180k-$260k / yearest.
full-timemid Remoteai-ml
๐ Tech Stack
๐ผ About This Role
You'll own the datasets that power our speech systems, auditing, cleaning, and building tooling for TTS training corpora. You'll develop data quality metrics and classifiers, directly improving model performance and robustness. This role drives the data flywheel for a social AI company pushing creative boundaries.
๐ฏ What You'll Do
- Define specs and curate large-scale audio/text datasets
- Build automated quality gates with dashboards
- Train lightweight classifiers for data filtering
- Optimize data mixtures via sampling and active learning
๐ Requirements
- Experience building ML-driven data quality systems for audio/speech
- Proficient in Python and PyTorch
- Audio/speech fundamentals: torchaudio, spectrogram features, VAD
- Scalable data engineering: Spark/Beam, SQL, Airflow
โจ Nice to Have
- Shipped datasets or tooling that improved TTS/ASR
- Built classifiers for LID, speaker verification, or noise detection
- Ran crowdsourcing annotation with quality control
๐ Benefits & Perks
- ๐ Remote-first culture
- ๐ Work on cutting-edge AI
- ๐ก Influence core data strategy
- ๐ Career growth in fast-moving startup
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Screenยท 30 min
- 2Technical Interviewยท 60 min
- 3Take-home Assignmentยท 3-4 hours
- 4Onsite (Virtual) Finalยท 3 hours
0 0 0