2h ago

Senior / Staff AI Research Engineer, Data Infrastructure

Milpitas, CA
full-timeseniorrobotics

Tech Stack

Description

You will own the data and learning engine behind RoboForce's Physical AI stack, building pipelines from raw teleoperation data collection through curation, annotation, storage, to post-training infrastructure that scores demonstrations and closes the loop for model retraining.

Requirements

  • 5+ years of experience
  • Strong proficiency in Python and production-grade data pipelines and ETL systems
  • Hands-on experience with large-scale dataset management, versioning, deduplication, quality filtering, and distributed storage (S3, GCS, HDF5, WebDataset, Zarr)
  • Experience with post-training infrastructure—SFT pipelines, reward modeling, or RL training loops (PPO, DPO, rejection sampling)
  • Familiarity with deep learning frameworks (PyTorch, JAX) and ML training workflows

Responsibilities

  • Design and maintain end-to-end data collection pipelines for multimodal demonstration data from teleoperation devices and UMI hardware
  • Build annotation tooling and data curation workflows for quality filtering, deduplication, episode scoring, and domain reweighting
  • Develop post-SFT reinforcement learning infrastructure including reward scoring, failure pattern mining, and retraining loop integration
  • Build evaluation and test infrastructure to log policy rollouts, capture results, and surface diagnostics
  • Collaborate with ML researchers to define data schemas, episode formats, and pipeline interfaces for VLA and manipulation policy training
0 views 0 saves 0 applications