Multimodal LLM Researcher at Pika

15h ago

Multimodal LLM Researcher

Palo Alto, CA

$185k-$400k / year

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll lead research on real-time multimodal generation and agentic platforms at Pika, a pioneering creative AI startup. Your work will directly shape foundational technologies for interactive multimedia experiences that empower millions of creators. This role offers the chance to publish at top-tier venues and deploy groundbreaking models.

🎯 What You'll Do

Lead research on real-time multimodal generation and agentic orchestration.
Design algorithms for high-fidelity synthesis across text, image, video, and audio.
Train and finetune autoregressive and diffusion models for real-time performance.
Curate large multimodal datasets for video, audio, and cross-modal data.
Publish findings at top conferences and collaborate with engineering teams.

📋 Requirements

5+ years of experience in LLM, VLM, audio LM, or deep learning.
First-author publications at NeurIPS, CVPR, ICML, ICCV, SIGGRAPH, etc.
Deep expertise in language modeling, vision-language modeling, or audio language modeling.
Strong experience with autoregressive and diffusion models and real-time deployment.

✨ Nice to Have

Experience with diffusion model distillation or world models.
Hands-on with agentic orchestration infrastructure.
Passion for building creative tools and platforms.

🎁 Benefits & Perks

💰 Competitive salary and substantial equity
🏥 Full health benefits + 401k matching
🏢 Hybrid work from Palo Alto HQ with flexibility

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

1Recruiter Screen· 30 min
2Technical Interview· 60 min
3Research Presentation· 60 min

Pika

Pika Jobs

Other jobs at Pika

No other jobs found.

0 0 0