6h ago

AI Benchmark Engineer

Japan (Remote)

$80k-$120k / yearest.

contractsenior Remoteai-ml

🛠 Tech Stack

💼 About This Role

You'll design and build rigorous multilingual benchmark tasks to test large language models' ability to handle non-English software challenges. Your work will directly measure multilingual robustness across prompt language effects and complex locale/encoding edge cases. This is a remote, freelance opportunity to contribute to cutting-edge AI evaluation.

🎯 What You'll Do

  • Design and build realistic task environments using native language datasets
  • Develop robust reference implementations and deterministic verifier scripts
  • Calibrate task difficulty by running standard configurations against multiple model tiers
  • Participate in rigorous 4-layer quality control process

📋 Requirements

  • 5+ years of software engineering experience
  • Native or near-native fluency in Japanese and high English proficiency
  • Strong proficiency in Python, shell scripting, and data processing
  • Deep technical understanding of multilingual text processing including encoding and locale conventions

✨ Nice to Have

  • Experience with coding agents and CLI-based development workflows
  • Background at top-tier technology companies or engineering universities

🎁 Benefits & Perks

  • 🏖️ Flexible schedule: work when you want, no fixed hours
  • 💰 Competitive rates with prompt payments
  • 🌍 Cutting-edge AI projects that shape human-machine communication
  • 🤝 Global community of language and AI professionals
  • 📈 Portfolio growth through diverse, innovative projects

📨 Hiring Process

Submit application with CV in English, complete GenAI assessment, then finalize onboarding for project eligibility.

🚩 Heads Up

  • Role mixes engineering and language specialist expectations
  • Freelance/contract with no guaranteed hours or income stability
  • Extensive quality control process may lead to unpaid rework
0 0 0