1d ago
Research Engineer, Multimodal Data
San Francisco
$150k-$250k / year
full-timeai-ml
๐ Tech Stack
๐ผ About This Role
You'll own the layer that makes petabytes of video queryable by content, running vision-language models over every clip to enable researchers to find relevant data in minutes. Your work directly accelerates customer model training iterations. This role combines research and engineering in a small, tight-knit team backed by top investors and partnered with leading Physical AI labs.
๐ฏ What You'll Do
- Own the visual understanding roadmap end-to-end.
- Train, fine-tune, and evaluate VLMs and embedding models.
- Drive down per-clip annotation cost at corpus scale.
- Design taxonomies and instrument quality for customer datasets.
๐ Requirements
- Strong familiarity with modern vision and multimodal models (VLMs, VQA, embeddings).
- Experience running these models at scale on real video/sensor data.
- Background from a perception team at a self-driving, robotics, or visual-data company.
- Comfortable with cloud infrastructure and large-scale data processing.
โจ Nice to Have
- Experience training vision or multimodal models from scratch.
- Hands-on time with big-data frameworks like Spark, Ray, or Daft.
- Experience designing labeling taxonomies or running annotation programs.
๐ Benefits & Perks
- ๐ฅ Health, vision, and dental coverage
- ๐๏ธ Flexible PTO
- ๐ฑ Catered lunches and dinners
- ๐ Commuter benefit
- ๐ป Latest Apple equipment
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Screenยท 30 min
- 2Technical Interviewยท 60 min
- 3On-site Loopยท 3 hours
0 0 0