1d ago

Research Engineer, Multimodal Data

San Francisco

$150k-$250k / year

full-timeai-ml

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll own the layer that makes petabytes of video queryable by content, running vision-language models over every clip to enable researchers to find relevant data in minutes. Your work directly accelerates customer model training iterations. This role combines research and engineering in a small, tight-knit team backed by top investors and partnered with leading Physical AI labs.

๐ŸŽฏ What You'll Do

  • Own the visual understanding roadmap end-to-end.
  • Train, fine-tune, and evaluate VLMs and embedding models.
  • Drive down per-clip annotation cost at corpus scale.
  • Design taxonomies and instrument quality for customer datasets.

๐Ÿ“‹ Requirements

  • Strong familiarity with modern vision and multimodal models (VLMs, VQA, embeddings).
  • Experience running these models at scale on real video/sensor data.
  • Background from a perception team at a self-driving, robotics, or visual-data company.
  • Comfortable with cloud infrastructure and large-scale data processing.

โœจ Nice to Have

  • Experience training vision or multimodal models from scratch.
  • Hands-on time with big-data frameworks like Spark, Ray, or Daft.
  • Experience designing labeling taxonomies or running annotation programs.

๐ŸŽ Benefits & Perks

  • ๐Ÿฅ Health, vision, and dental coverage
  • ๐Ÿ–๏ธ Flexible PTO
  • ๐Ÿฑ Catered lunches and dinners
  • ๐Ÿš† Commuter benefit
  • ๐Ÿ’ป Latest Apple equipment

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks ยท AI estimate

  1. 1Recruiter Screenยท 30 min
  2. 2Technical Interviewยท 60 min
  3. 3On-site Loopยท 3 hours
0 0 0