6h ago
AI Benchmark Engineer
Japan (Remote)
✨ $80k-$120k / yearest.
contractsenior Remoteai-ml
🛠 Tech Stack
💼 About This Role
You'll design and build rigorous multilingual benchmark tasks to test large language models' ability to handle non-English software challenges. Your work will directly measure multilingual robustness across prompt language effects and complex locale/encoding edge cases. This is a remote, freelance opportunity to contribute to cutting-edge AI evaluation.
🎯 What You'll Do
- Design and build realistic task environments using native language datasets
- Develop robust reference implementations and deterministic verifier scripts
- Calibrate task difficulty by running standard configurations against multiple model tiers
- Participate in rigorous 4-layer quality control process
📋 Requirements
- 5+ years of software engineering experience
- Native or near-native fluency in Japanese and high English proficiency
- Strong proficiency in Python, shell scripting, and data processing
- Deep technical understanding of multilingual text processing including encoding and locale conventions
✨ Nice to Have
- Experience with coding agents and CLI-based development workflows
- Background at top-tier technology companies or engineering universities
🎁 Benefits & Perks
- 🏖️ Flexible schedule: work when you want, no fixed hours
- 💰 Competitive rates with prompt payments
- 🌍 Cutting-edge AI projects that shape human-machine communication
- 🤝 Global community of language and AI professionals
- 📈 Portfolio growth through diverse, innovative projects
📨 Hiring Process
Submit application with CV in English, complete GenAI assessment, then finalize onboarding for project eligibility.
🚩 Heads Up
- Role mixes engineering and language specialist expectations
- Freelance/contract with no guaranteed hours or income stability
- Extensive quality control process may lead to unpaid rework
0 0 0