2h ago
Senior / Staff AI Research Engineer, Data Infrastructure
Milpitas, CA
full-timeseniorrobotics
Tech Stack
Description
You will own the data and learning engine behind RoboForce's Physical AI stack, building pipelines from raw teleoperation data collection through curation, annotation, storage, to post-training infrastructure that scores demonstrations and closes the loop for model retraining.
Requirements
- 5+ years of experience
- Strong proficiency in Python and production-grade data pipelines and ETL systems
- Hands-on experience with large-scale dataset management, versioning, deduplication, quality filtering, and distributed storage (S3, GCS, HDF5, WebDataset, Zarr)
- Experience with post-training infrastructure—SFT pipelines, reward modeling, or RL training loops (PPO, DPO, rejection sampling)
- Familiarity with deep learning frameworks (PyTorch, JAX) and ML training workflows
Responsibilities
- Design and maintain end-to-end data collection pipelines for multimodal demonstration data from teleoperation devices and UMI hardware
- Build annotation tooling and data curation workflows for quality filtering, deduplication, episode scoring, and domain reweighting
- Develop post-SFT reinforcement learning infrastructure including reward scoring, failure pattern mining, and retraining loop integration
- Build evaluation and test infrastructure to log policy rollouts, capture results, and surface diagnostics
- Collaborate with ML researchers to define data schemas, episode formats, and pipeline interfaces for VLA and manipulation policy training
0 views 0 saves 0 applications