5h ago
Member of Technical Staff (Data): World Models
Remote
✨ $150k-$250k / yearest.
full-timemid Remoteai-ml
🛠 Tech Stack
💼 About This Role
You'll own the data pipelines and storage systems that feed petabyte-scale multimodal datasets into model training for an AI company. Your impact will be building automated, efficient tooling to enable processing at scale. This role is fully remote with an async-first culture and startup equity.
🎯 What You'll Do
- Design, automate, and optimize Python ETL pipelines (Spark/Ray) for multimodal data.
- Build data cataloging, quality tooling, and lifecycle management systems.
- Provide guidance and documentation on data best practices.
- Serve as custodian of datasets ensuring data health and quality.
📋 Requirements
- Python ETL pipelines with Spark or Ray.
- Experience with large-scale data formats and storage systems.
- ML fundamentals for collaboration with researchers.
- Ability to write high-quality specifications for AI agents.
✨ Nice to Have
- Experience with data versioning and annotation tools.
- Knowledge of multimodal data processing.
- Experience with petabyte-scale datasets.
🎁 Benefits & Perks
- 💰 Competitive salary and equity
- 🏥 Private health coverage
- 💻 Hardware setup of your choice
- 🌍 Fully-distributed, async-first culture
- 🍽️ Stipends for phone, internet, and meals
🚩 Heads Up
- Explicitly mentions occasional late nights and weekends dedication.
- Broad range of required skills spanning data engineering, ML, and agentic engineering.
0 0 0