3h ago

Staff Machine Learning Engineer, Agentic

Bellevue, WA; Menlo Park, CA
full-timesenior Hybridfintech

Tech Stack

Description

You will define and uphold the quality bar for agentic systems across Robinhood, designing evaluation frameworks, guiding model selection, and partnering with product, data science, and engineering teams to ensure systems meet standards for correctness, safety, latency, and user satisfaction.

Requirements

  • Deep experience defining and measuring quality for agentic or ML systems using evaluation frameworks, datasets, and scorecards
  • Experience evaluating large language models including tradeoffs in performance, cost, and latency
  • Ability to analyze production issues and lead initiatives improving system quality across multiple teams
  • Comfortable working with engineers, data scientists, and product partners to deliver measurable improvements
  • Experience in regulated environments or with AI evaluation and observability tools (nice to have)

Responsibilities

  • Define and implement evaluation frameworks for agent performance including task success, correctness, tool usage reliability, latency, safety, and user satisfaction
  • Evaluate frontier and fine-tuned models across quality, latency, cost, and edge cases
  • Partner with product managers, data scientists, and engineers to translate evaluation results into launch criteria
  • Analyze production issues, identify root causes, and prioritize improvements for system reliability
  • Build visibility into agent performance through metrics, monitoring, and reporting
0 views 0 saves 0 applications