AI Benchmark Engineer at LILT (Production)

10h ago

AI Benchmark Engineer

China (Remote)

✨ $70k-$100k / yearest.

contractsenior Remoteai-ml

🛠 Tech Stack

💼 About This Role

You'll design and validate multilingual benchmark tasks for large language models, focusing on terminal-based workflows and non-English data processing. Your work will ensure rigorous evaluation of AI multilingual robustness.

🎯 What You'll Do

Design and build realistic multilingual task environments
Write deterministic verifier scripts for benchmark tasks
Analyze execution logs to calibrate task difficulty
Participate in multi-layer quality assurance process

📋 Requirements

5+ years of industry software engineering experience
Native fluency in Chinese Mandarin
Strong proficiency in Python and shell scripting
Deep understanding of multilingual text processing

✨ Nice to Have

Experience with coding agents or LLM evaluation
Proven track record at leading tech companies
Familiarity with Unicode normalization and locale conventions

🎁 Benefits & Perks

🗓️ Flexible schedule as an independent contractor
💰 Competitive rates with prompt payments
🌍 Work on cutting-edge AI projects
🤝 Global community of language professionals

📨 Hiring Process

Estimated timeline: 1-2 weeks

1Submit application with CV· 30 min
2Complete GenAI assessment· 1 hour
3Finalize onboarding and profile set-up· 1 hour

[email protected]

LILT (Production)

LILT (Production) Jobs

Other jobs at LILT (Production)

No other jobs found.

0 0 0