Research Engineer – Evals at Firecrawl — CareerPair

13h ago

Research Engineer – Evals

San Francisco, CA

$160k-$240k / year

full-timesenior Hybridsoftware

💼 About This Role

You'll build the evaluation systems that tell us whether Firecrawl actually works, designing metrics and pipelines to measure output quality across millions of websites. You'll own the feedback loop from quality measurement back to model and product decisions, working closely with RL and Search/IR engineers to turn evaluations into training signals. This role offers deep technical ownership and the chance to define what "good" means for web data extraction at scale.

🎯 What You'll Do

Build eval stack from scratch, defining metrics and pipelines
Design benchmark datasets covering real-world distribution of customer data
Own LLM-as-judge pipelines for automated extraction quality scoring
Close the loop between evals and model training via RL/feedback signals
Run fast experiments and communicate results clearly to the team

📋 Requirements

3+ years in ML engineering, applied AI, or data quality with production systems
Experience building eval infrastructure at scale, including pipeline and dataset curation
Deep understanding of LLM evaluation methodology, including LLM-as-judge pitfalls
Production experience with unstructured web data and quality metrics

✨ Nice to Have

Experience with RLHF pipelines and reward modeling
Background in building human review tooling for data quality
Familiarity with web scraping, dynamic rendering, and SPAs

🎁 Benefits & Perks

🏖️ Unlimited PTO
💰 Equity up to 0.10%
🏢 Hybrid or remote (Americas timezones)
🚀 High velocity rapid iteration and deployment

📨 Hiring Process

Estimated timeline: 2-3 weeks · AI estimate

1Recruiter Screen· 30 min
2Technical Interview· 60 min
3Hiring Manager Chat· 45 min

Firecrawl

Firecrawl Jobs

Other jobs at Firecrawl

No other jobs found.

0 0 0