2h ago

AI QA Trainer - LLM Evaluation - Freelance Project

World Wide - Remote

$6-$65 / year

contractsenior Remote

Tech Stack

Description

You'll challenge advanced language models on tasks like hallucination detection and factual consistency, documenting failure modes to improve model quality. On a typical day, you will converse with the model on real-world scenarios, verify factual accuracy, design test plans, and suggest improvements to prompt engineering and evaluation metrics.

Requirements

  • Bachelor's, master's, or PhD in CS, data science, or related field
  • Experience in QA for ML/AI systems, safety/red-team, or test automation
  • Hands-on work with LLM eval tooling

Responsibilities

  • Evaluate LLMs on hallucination detection and factual consistency
  • Design and run test plans and regression suites
  • Build clear rubrics and pass/fail criteria
  • Capture reproducible error traces with root-cause hypotheses
  • Partner on adversarial red-teaming and automation
0 views 0 saves 0 applications