2h ago
AI QA Trainer - LLM Evaluation - Freelance Project
World Wide - Remote
$6-$65 / year
contractsenior Remote
Tech Stack
Description
You'll challenge advanced language models on tasks like hallucination detection and factual consistency, documenting failure modes to improve model quality. On a typical day, you will converse with the model on real-world scenarios, verify factual accuracy, design test plans, and suggest improvements to prompt engineering and evaluation metrics.
Requirements
- Bachelor's, master's, or PhD in CS, data science, or related field
- Experience in QA for ML/AI systems, safety/red-team, or test automation
- Hands-on work with LLM eval tooling
Responsibilities
- Evaluate LLMs on hallucination detection and factual consistency
- Design and run test plans and regression suites
- Build clear rubrics and pass/fail criteria
- Capture reproducible error traces with root-cause hypotheses
- Partner on adversarial red-teaming and automation
0 views 0 saves 0 applications