5h ago

Research Engineer, Evaluations

Remote - New York
full-timesenior RemoteVoice AI

Tech Stack

Description

You will own the evaluation infrastructure for streaming speech-to-text models, ensuring we measure the right things and benchmark against competitors. You'll translate customer feedback into quantifiable metrics, manage datasets, and maintain evaluation pipelines to accelerate research.

Requirements

  • Machine Learning / Research Engineering background
  • Experience with evaluation benchmarking and metrics development
  • Ability to communicate with customer-facing teams and researchers
  • Familiarity with voice agent ecosystems (e.g., LiveKit, Pipecat, Vapi)
  • Strong analytical skills to convert vague feedback into concrete metrics

Responsibilities

  • Own end-to-end and integration-level model evaluation for accuracy, latency, and feature-specific metrics
  • Build and maintain competitive benchmarking pipelines against other providers
  • Design and run systematic experiments to measure impact of model changes
  • Onboard, curate, and maintain evaluation datasets including public benchmarks and internal test sets
  • Define evaluation metrics capturing real-world performance and translate qualitative customer feedback into quantifiable criteria
0 views 0 saves 0 applications