15h ago

Multimodal LLM Researcher

Palo Alto, CA

$185k-$400k / year

full-timeseniorai-ml

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll lead research on real-time multimodal generation and agentic platforms at Pika, a pioneering creative AI startup. Your work will directly shape foundational technologies for interactive multimedia experiences that empower millions of creators. This role offers the chance to publish at top-tier venues and deploy groundbreaking models.

๐ŸŽฏ What You'll Do

  • Lead research on real-time multimodal generation and agentic orchestration.
  • Design algorithms for high-fidelity synthesis across text, image, video, and audio.
  • Train and finetune autoregressive and diffusion models for real-time performance.
  • Curate large multimodal datasets for video, audio, and cross-modal data.
  • Publish findings at top conferences and collaborate with engineering teams.

๐Ÿ“‹ Requirements

  • 5+ years of experience in LLM, VLM, audio LM, or deep learning.
  • First-author publications at NeurIPS, CVPR, ICML, ICCV, SIGGRAPH, etc.
  • Deep expertise in language modeling, vision-language modeling, or audio language modeling.
  • Strong experience with autoregressive and diffusion models and real-time deployment.

โœจ Nice to Have

  • Experience with diffusion model distillation or world models.
  • Hands-on with agentic orchestration infrastructure.
  • Passion for building creative tools and platforms.

๐ŸŽ Benefits & Perks

  • ๐Ÿ’ฐ Competitive salary and substantial equity
  • ๐Ÿฅ Full health benefits + 401k matching
  • ๐Ÿข Hybrid work from Palo Alto HQ with flexibility

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks ยท AI estimate

  1. 1Recruiter Screenยท 30 min
  2. 2Technical Interviewยท 60 min
  3. 3Research Presentationยท 60 min
0 0 0