10h ago
AI Benchmark Engineer
China (Remote)
โจ $70k-$100k / yearest.
contractsenior Remoteai-ml
๐ Tech Stack
๐ผ About This Role
You'll design and validate multilingual benchmark tasks for large language models, focusing on terminal-based workflows and non-English data processing. Your work will ensure rigorous evaluation of AI multilingual robustness.
๐ฏ What You'll Do
- Design and build realistic multilingual task environments
- Write deterministic verifier scripts for benchmark tasks
- Analyze execution logs to calibrate task difficulty
- Participate in multi-layer quality assurance process
๐ Requirements
- 5+ years of industry software engineering experience
- Native fluency in Chinese Mandarin
- Strong proficiency in Python and shell scripting
- Deep understanding of multilingual text processing
โจ Nice to Have
- Experience with coding agents or LLM evaluation
- Proven track record at leading tech companies
- Familiarity with Unicode normalization and locale conventions
๐ Benefits & Perks
- ๐๏ธ Flexible schedule as an independent contractor
- ๐ฐ Competitive rates with prompt payments
- ๐ Work on cutting-edge AI projects
- ๐ค Global community of language professionals
๐จ Hiring Process
Estimated timeline: 1-2 weeks
- 1Submit application with CVยท 30 min
- 2Complete GenAI assessmentยท 1 hour
- 3Finalize onboarding and profile set-upยท 1 hour
0 0 0