AI Benchmark Engineer at LILT (Production)

6h ago

AI Benchmark Engineer

Japan (Remote)

✨ $80k-$120k / yearest.

contractsenior Remoteai-ml

🛠 Tech Stack

💼 About This Role

You'll design and build rigorous multilingual benchmark tasks to test large language models' ability to handle non-English software challenges. Your work will directly measure multilingual robustness across prompt language effects and complex locale/encoding edge cases. This is a remote, freelance opportunity to contribute to cutting-edge AI evaluation.

🎯 What You'll Do

Design and build realistic task environments using native language datasets
Develop robust reference implementations and deterministic verifier scripts
Calibrate task difficulty by running standard configurations against multiple model tiers
Participate in rigorous 4-layer quality control process

📋 Requirements

5+ years of software engineering experience
Native or near-native fluency in Japanese and high English proficiency
Strong proficiency in Python, shell scripting, and data processing
Deep technical understanding of multilingual text processing including encoding and locale conventions

✨ Nice to Have

Experience with coding agents and CLI-based development workflows
Background at top-tier technology companies or engineering universities

🎁 Benefits & Perks

🏖️ Flexible schedule: work when you want, no fixed hours
💰 Competitive rates with prompt payments
🌍 Cutting-edge AI projects that shape human-machine communication
🤝 Global community of language and AI professionals
📈 Portfolio growth through diverse, innovative projects

📨 Hiring Process

Submit application with CV in English, complete GenAI assessment, then finalize onboarding for project eligibility.

🚩 Heads Up

Role mixes engineering and language specialist expectations
Freelance/contract with no guaranteed hours or income stability
Extensive quality control process may lead to unpaid rework

LILT (Production)

LILT (Production) Jobs

Other jobs at LILT (Production)

No other jobs found.

0 0 0