Senior Software Engineer — AI Evaluation & Benchmarks (Python) at G2i Inc. — CareerPair

1d ago

Senior Software Engineer — AI Evaluation & Benchmarks (Python)

Miami

$166.4k-$208k / year

contractsenior Remoteai-ml

🛠 Tech Stack

💼 About This Role

You'll design and build coding benchmarks and evaluation pipelines to test frontier AI models on real software engineering work. Your work will directly shape how model coding ability is measured and improved.

🎯 What You'll Do

Design coding benchmarks for frontier AI models on real-world programming tasks.
Build and maintain scalable data pipelines for evaluation workflows.
Analyze model-generated code for correctness, reliability, and edge-case failures.
Construct structured evaluation scenarios across large repos and multi-language environments.

📋 Requirements

4+ years of professional software engineering experience.
Expert Python skills with clean, performant, well-tested code.
Hands-on experience in large, complex codebases.
Proven experience designing and implementing LLM coding benchmarks and evaluation data pipelines.

✨ Nice to Have

Senior or Lead-level profile with history of technical ownership.
Proficiency in additional languages: JavaScript, Go, C++.
CI/CD experience and robust unit testing (pytest, Mocha, JUnit).

🎁 Benefits & Perks

🌍 Fully remote — work from anywhere in accepted locations.
💵 Competitive hourly rate $80–$100/hr based on location and seniority.
📆 Weekly payments via PayPal or Stripe.
🔁 Potential extension beyond initial 3-month contract.

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

1Application review· 1-2 weeks
2Technical interview· 1 hour
3Offer· 1 week

🚩 Heads Up

Contract role with variable hours — not suitable as sole income.
Requires identity verification and proof of valid work documentation.
No visa sponsorship; incompatible with F-1 OPT or STEM OPT.

G2i Inc.

G2i Inc. Jobs

Other jobs at G2i Inc.

No other jobs found.

0 0 0