5h ago
Research Engineer, Evaluations
Remote - New York
full-timesenior RemoteVoice AI
Tech Stack
Description
You will own the evaluation infrastructure for streaming speech-to-text models, ensuring we measure the right things and benchmark against competitors. You'll translate customer feedback into quantifiable metrics, manage datasets, and maintain evaluation pipelines to accelerate research.
Requirements
- Machine Learning / Research Engineering background
- Experience with evaluation benchmarking and metrics development
- Ability to communicate with customer-facing teams and researchers
- Familiarity with voice agent ecosystems (e.g., LiveKit, Pipecat, Vapi)
- Strong analytical skills to convert vague feedback into concrete metrics
Responsibilities
- Own end-to-end and integration-level model evaluation for accuracy, latency, and feature-specific metrics
- Build and maintain competitive benchmarking pipelines against other providers
- Design and run systematic experiments to measure impact of model changes
- Onboard, curate, and maintain evaluation datasets including public benchmarks and internal test sets
- Define evaluation metrics capturing real-world performance and translate qualitative customer feedback into quantifiable criteria
0 views 0 saves 0 applications