16h ago

Senior Software Engineer β€” AI Evaluation & Benchmarks

US

$166.4k-$208k / year

full-timesenior Remoteai-ml

πŸ›  Tech Stack

πŸ’Ό About This Role

You'll design and build coding benchmarks that evaluate frontier AI models on real-world software engineering tasks. Your work directly influences how next-generation models are trained and improved. This role sits at the intersection of software engineering and AI research, where you'll develop scalable systems to run evaluations across large codebases.

🎯 What You'll Do

  • Design and build coding benchmarks for frontier AI models
  • Develop scalable evaluation pipelines and data infrastructure
  • Analyze AI-generated code for correctness and performance issues
  • Contribute to design and evolution of evaluation methodologies

πŸ“‹ Requirements

  • 4+ years of professional software engineering experience
  • Expert-level Python development skills
  • Experience with large, complex, production-grade codebases
  • Experience building or contributing to LLM evaluation systems

✨ Nice to Have

  • Familiarity with JavaScript, Go, or C++
  • Background in ML evaluation methodologies
  • Open-source contributions or security engineering experience

🎁 Benefits & Perks

  • πŸ’° Competitive hourly compensation ($80-$100/hr)
  • 🌍 Fully remote with global flexibility
  • πŸ“† Weekly payments via PayPal or Stripe
  • ⏳ Short-term 3-month contract with potential extension
  • πŸš€ Work on cutting-edge AI systems

πŸ“¨ Hiring Process

Estimated timeline: 2-4 weeks Β· AI estimate

  1. 1Recruiter screenΒ· 30 min
  2. 2Technical interviewΒ· 60 min
  3. 3Hiring decisionΒ· 1-2 weeks
0 0 0