AI Benchmark Engineer at LILT (Production)

5h ago

AI Benchmark Engineer

Turkey

✨ $100k-$180k / yearest.

contractsenior Remoteai-ml

🛠 Tech Stack

💼 About This Role

You'll build multilingual evaluation tasks for large language models, focusing on terminal-based software challenges. Your work will measure multilingual robustness across encoding and locale edge cases. This is a remote freelance role with flexible hours.

🎯 What You'll Do

Design and build benchmark tasks for coding agents
Create realistic task environments using native language data
Develop robust reference implementations and verifier scripts
Calibrate task difficulty across multiple model tiers

📋 Requirements

5+ years of software engineering experience
Native Turkish fluency with deep grammar knowledge
Strong proficiency in Python, shell scripting, and data processing
Experience with terminal/CLI development workflows

✨ Nice to Have

Background at leading tech companies or top-tier universities
Knowledge of Unicode normalization and locale-dependent conventions
Familiarity with coding agents

🎁 Benefits & Perks

🗓️ Flexible schedule as an independent contractor
💰 Competitive rates with prompt payments
🌍 Work on cutting-edge AI and language technology
🤝 Join a global community of language professionals
📚 Access to diverse, innovative projects

📨 Hiring Process

Submit application with CV, complete GenAI assessment, finalize onboarding.

[email protected]

LILT (Production)

LILT (Production) Jobs

Other jobs at LILT (Production)

No other jobs found.

0 0 0