Research Engineer - Data at Periodic Labs — CareerPair

22h ago

Research Engineer - Data

Menlo Park

$350k-$400k / year

full-timeai-ml Visa Sponsor

🛠 Tech Stack

💼 About This Role

You'll build and drive the data foundation for our research efforts, owning data strategy end-to-end from sourcing datasets to integrating experimental data into the training stack. You'll work closely with researchers to understand model needs and build pipelines to get the right data in the right shape. This role sits at the intersection of data engineering, research infrastructure, and strategy.

🎯 What You'll Do

Own data strategy across the training stack
Source, evaluate, and procure external datasets
Build and maintain robust data ingestion pipelines
Design data quality systems for deduplication and filtering
Integrate experimental data into the training stack

📋 Requirements

Large-scale data pipelines for LLM pretraining or midtraining
Data quality techniques like MinHash, perplexity filtering, classifier scoring
Scientific data formats (papers, patents, databases, lab exports)
Distributed processing with Spark, Ray, or Dask at petabyte scale
Python engineering in production research environment

✨ Nice to Have

Scientific dataset curation for domain-adaptive continued pretraining
Synthetic data generation methods and pipelines
Physical science background (chemistry, physics, materials)
Multimodal data integration (text, numerical, molecular, spectral)

🎁 Benefits & Perks

🚀 Flexible location (Menlo Park or San Francisco preferred)
💼 Visa sponsorship available
💰 Competitive base salary $350k-$400k
🔬 Cutting-edge AI research environment
🌟 World-class team and investors

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

1Recruiter Screen· 30 min
2Technical Interview· 60 min
3Team Interview· 60 min
4Final Round· 60 min

Periodic Labs

Periodic Labs Jobs

Other jobs at Periodic Labs

No other jobs found.

0 0 0