You'll spearhead the launch of a **new vertical** for a fast-growing AI data company, building the roadmap, forging partnerships, and closing deals. You'll drive **product-market fit** by connecting model builders with data holders in a high-ambiguity environment.
Remote|Lead|Full-time|Ai-ml
Forward Deployed Engineer, Healthcare
You'll **own healthcare customer engagements end-to-end**, from scoping through delivery and support. You'll build **custom solutions** for complex needs and **bridge customer requirements** with Protege's platform capabilities. This role offers a chance to shape AI training data exchange.
Remote|Senior|Full-time|Healthcare
Forward Deployed Engineer - New Verticals
You'll partner with the GM and early customers to define technical needs for a new vertical, building reusable infrastructure on top of our platform. You'll own first customer engagements end-to-end, writing robust code that solves immediate problems while compounding into reusable systems. This role sits at the intersection of **engineering**, **product judgment**, and **customer reality**.
Remote|Senior|Full-time|Ai-ml
Senior Software Engineer, Data Processing
You'll own the data processing layer at ingestion, turning large-scale source data into clean, structured, AI-ready datasets. Your core impact is building robust **ingestion and processing systems** for multimodal data at massive scale. This role offers deep ownership of critical infrastructure in a fast-moving AI data exchange platform.
Remote|Senior|Full-time|Ai-ml
Senior Researcher - Diffusion and Vision Research
You'll lead **evaluation and optimization** of large-scale datasets for **generative video models** at Protege, a platform for AI training data exchange. Your work will directly impact model quality and define the value of our video data assets. This role offers the chance to shape the future of data-centric AI at a fast-moving startup backed by top investors.
Remote|Senior|Full-time|Ai-ml
You'll own security end-to-end at a fast-growing AI data startup, building the program from the ground up. You'll partner with engineering, product, and legal to embed security into **cloud infrastructure** and **data pipelines**, while earning trust with AI companies and data partners. This hands-on leadership role reports to the VP of Engineering and is key to shaping how we protect training data.
Remote|Lead|Full-time|Ai-ml
Solutions Applied Data Scientist, Healthcare
You'll act as a technical partner to Solutions Leads, solving complex **healthcare data cohort construction** and **multi-source dataset assembly** challenges for AI model training. You will translate customer requirements into practical dataset definitions and build SQL to construct datasets. This role focuses on solving real-world data problems, not research.
Remote|Mid|Full-time|Healthcare
Solutions Engineer, Healthcare
You'll own cross-cloud data movement and delivery for large-scale healthcare datasets, using tools like rclone and Python to ensure safe, repeatable operations. You'll debug failures and maintain high data integrity for regulated healthcare data. This role combines **production engineering** with **data pipeline reliability** in a fast-moving AI startup.
Remote|Mid|Full-time|Healthcare
Director of Research, DataLab
You'll lead Protege's DataLab research arm, tackling foundational challenges in AI training data. You'll define the research agenda, build rigorous experimentation systems, and ensure research informs product and customer strategy. This role combines deep technical leadership with direct customer impact.
Remote|Lead|Full-time|Ai-ml
Machine Learning Researcher - RL and Agentic Systems
You'll design datasets, tasks, and environments for benchmarking agentic systems, working closely with research and engineering teams. You'll develop frameworks for evaluating real-world data quality and benchmark model behavior in **RL and agentic settings**. This role connects applied research directly to real-world deployment at a fast-moving AI data startup.
Remote|Senior|Full-time|Ai-ml
Machine Learning Researcher - Audio
You'll develop **audio data quality metrics and evaluation frameworks** for training speech and multimodal models. Your work will directly impact **how large-scale speech datasets are assessed and improved**, connecting signal-level quality to downstream model performance. This role combines research with practical tool-building for a fast-moving AI data startup.
Partnerships Operations Lead (Healthcare)
You'll **operationalize healthcare data partnerships** for an **AI training data platform**, owning onboarding, technical integration, and compliance. You'll ensure reliable data delivery for customer engagements and scale partner ecosystems. This role sits at the intersection of partnerships, science, product, and delivery.
Remote|Senior|Full-time|Healthcare
Product Manager - Privacy, Rights & Trust
You'll own the privacy product from discovery through execution, defining the roadmap and building trusted capabilities for AI training data exchange. Your core impact will be choosing the right first wedge, scoping it with discipline, and navigating complex stakeholder landscapes. This role uniquely bridges **research translation**, **vendor strategy**, and **external credibility** in a fast-moving AI data startup.
Remote|Senior|Full-time|Ai-ml
Solutions Engineer (Media)
You'll own **data quality and curation** for Protege's media catalog, translating customer AI data needs into structured datasets. You will work with **imperfect, real-world partner data** using SQL, embeddings, and AI tools. This role is central to **delivering high-quality training data** that powers ambitious AI teams.
Remote|Mid|Full-time|Ai-ml
Research Scientist, Benchmarks & Evaluations
You'll lead the design of benchmarks and evaluations that frontier labs, enterprises, and policymakers can trust. Your work will directly shape **eval datasets** that distinguish capability levels across frontier models, including **agentic** and **domain-specific** settings. You'll publish research establishing Protege as the **standard-setter** for evaluation data.
Remote|Senior|Full-time|Ai-ml