16h ago
Senior Software Engineer β AI Evaluation & Benchmarks
US
$166.4k-$208k / year
full-timesenior Remoteai-ml
π Tech Stack
πΌ About This Role
You'll design and build coding benchmarks that evaluate frontier AI models on real-world software engineering tasks. Your work directly influences how next-generation models are trained and improved. This role sits at the intersection of software engineering and AI research, where you'll develop scalable systems to run evaluations across large codebases.
π― What You'll Do
- Design and build coding benchmarks for frontier AI models
- Develop scalable evaluation pipelines and data infrastructure
- Analyze AI-generated code for correctness and performance issues
- Contribute to design and evolution of evaluation methodologies
π Requirements
- 4+ years of professional software engineering experience
- Expert-level Python development skills
- Experience with large, complex, production-grade codebases
- Experience building or contributing to LLM evaluation systems
β¨ Nice to Have
- Familiarity with JavaScript, Go, or C++
- Background in ML evaluation methodologies
- Open-source contributions or security engineering experience
π Benefits & Perks
- π° Competitive hourly compensation ($80-$100/hr)
- π Fully remote with global flexibility
- π Weekly payments via PayPal or Stripe
- β³ Short-term 3-month contract with potential extension
- π Work on cutting-edge AI systems
π¨ Hiring Process
Estimated timeline: 2-4 weeks Β· AI estimate
- 1Recruiter screenΒ· 30 min
- 2Technical interviewΒ· 60 min
- 3Hiring decisionΒ· 1-2 weeks
0 0 0