14h ago

AI Researcher (Multimodal Audio/Video Generation)

San Francisco, CA | London, UK

โœจ $160k-$240k / yearest.

full-timesenior Remoteai-ml

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll lead research on audio-visual avatar generation for conversational AI humans. Your mission is to push generative models in diffusion and multimodal modeling to new frontiers. You'll translate cutting-edge research into production and publish at top venues.

๐ŸŽฏ What You'll Do

  • Lead research on audio-visual generation for avatars (Neural Avatars, Talking-Heads).
  • Design models capturing verbal and non-verbal signals in conversation flow.
  • Drive innovation in diffusion models, long-video generation, and audio-visual modeling.
  • Translate research into production with Applied ML and engineering teams.

๐Ÿ“‹ Requirements

  • PhD or equivalent research experience.
  • 2-3+ years hands-on experience with generative models at scale.
  • Expertise in diffusion models and efficiency techniques.
  • Experience in multimodal generation (video, audio, language).

โœจ Nice to Have

  • Skills in 3D graphics, Gaussian splatting, or large-scale training.
  • Exposure to generative AI models beyond specialty.
  • Familiarity with software development best practices.

๐ŸŽ Benefits & Perks

  • ๐Ÿš€ Series A backed by Sequoia, Y Combinator, Scale Venture Partners.
  • ๐ŸŒ Remote within US or Europe considered.
  • ๐Ÿข Office in San Francisco (hybrid) or London.

๐Ÿ“จ Hiring Process

Estimated timeline: 3-5 weeks ยท AI estimate

  1. 1Recruiter Callยท 30 min
  2. 2Technical Interviewยท 60 min
  3. 3Research Presentationยท 45 min
  4. 4Team Interviewยท 45 min
  5. 5Offerยท 15 min
0 0 0