1d ago

AI Evaluator for Software Engineering

Miami

$208k-$416k / year

contractsenior Remoteai-ml

🛠 Tech Stack

💼 About This Role

You'll evaluate how AI coding agents like OpenAI Codex and Claude Code behave in real-world scenarios. Your core impact is judging whether model responses reflect strong engineering judgment and taste, not just syntax correctness. This role stands out because you'll help define what great interaction looks like for modern AI-assisted dev workflows.

🎯 What You'll Do

  • Evaluate AI-generated coding interactions end-to-end
  • Assess quality of explanations and reasoning, not just code
  • Distinguish between different levels of response quality
  • Provide clear, opinionated feedback on what worked or felt off

📋 Requirements

  • Staff/Principal-level engineer (or equivalent) with strong background in TypeScript/JavaScript or Python
  • Hands-on experience using OpenAI Codex, Claude Code, or Cursor
  • Deep familiarity with modern AI-assisted dev workflows
  • Comfortable giving direct, opinionated feedback

✨ Nice to Have

  • Experience with tools like Cursor or similar AI-first IDEs
  • Prior exposure to prompt design or evaluation workflows
  • Experience mentoring senior engineers or defining engineering standards

🎁 Benefits & Perks

  • 💰 $100–$200/hour rate
  • 10–20 hours/week flexible schedule
  • 🚀 Start ASAP through early May with possible extension

📨 Hiring Process

Estimated timeline: 1-2 weeks

  1. 1Take-home evaluation exercise
  2. 2One behavioral interview
0 0 0