15h ago
Multimodal LLM Researcher
Palo Alto, CA
$185k-$400k / year
full-timeseniorai-ml
๐ Tech Stack
๐ผ About This Role
You'll lead research on real-time multimodal generation and agentic platforms at Pika, a pioneering creative AI startup. Your work will directly shape foundational technologies for interactive multimedia experiences that empower millions of creators. This role offers the chance to publish at top-tier venues and deploy groundbreaking models.
๐ฏ What You'll Do
- Lead research on real-time multimodal generation and agentic orchestration.
- Design algorithms for high-fidelity synthesis across text, image, video, and audio.
- Train and finetune autoregressive and diffusion models for real-time performance.
- Curate large multimodal datasets for video, audio, and cross-modal data.
- Publish findings at top conferences and collaborate with engineering teams.
๐ Requirements
- 5+ years of experience in LLM, VLM, audio LM, or deep learning.
- First-author publications at NeurIPS, CVPR, ICML, ICCV, SIGGRAPH, etc.
- Deep expertise in language modeling, vision-language modeling, or audio language modeling.
- Strong experience with autoregressive and diffusion models and real-time deployment.
โจ Nice to Have
- Experience with diffusion model distillation or world models.
- Hands-on with agentic orchestration infrastructure.
- Passion for building creative tools and platforms.
๐ Benefits & Perks
- ๐ฐ Competitive salary and substantial equity
- ๐ฅ Full health benefits + 401k matching
- ๐ข Hybrid work from Palo Alto HQ with flexibility
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Screenยท 30 min
- 2Technical Interviewยท 60 min
- 3Research Presentationยท 60 min
0 0 0