10h ago

AI Benchmark Engineer

China (Remote)

โœจ $70k-$100k / yearest.

contractsenior Remoteai-ml

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll design and validate multilingual benchmark tasks for large language models, focusing on terminal-based workflows and non-English data processing. Your work will ensure rigorous evaluation of AI multilingual robustness.

๐ŸŽฏ What You'll Do

  • Design and build realistic multilingual task environments
  • Write deterministic verifier scripts for benchmark tasks
  • Analyze execution logs to calibrate task difficulty
  • Participate in multi-layer quality assurance process

๐Ÿ“‹ Requirements

  • 5+ years of industry software engineering experience
  • Native fluency in Chinese Mandarin
  • Strong proficiency in Python and shell scripting
  • Deep understanding of multilingual text processing

โœจ Nice to Have

  • Experience with coding agents or LLM evaluation
  • Proven track record at leading tech companies
  • Familiarity with Unicode normalization and locale conventions

๐ŸŽ Benefits & Perks

  • ๐Ÿ—“๏ธ Flexible schedule as an independent contractor
  • ๐Ÿ’ฐ Competitive rates with prompt payments
  • ๐ŸŒ Work on cutting-edge AI projects
  • ๐Ÿค Global community of language professionals

๐Ÿ“จ Hiring Process

Estimated timeline: 1-2 weeks

  1. 1Submit application with CVยท 30 min
  2. 2Complete GenAI assessmentยท 1 hour
  3. 3Finalize onboarding and profile set-upยท 1 hour

[email protected]

0 0 0