14h ago
AI Researcher (Multimodal Audio/Video Generation)
San Francisco, CA | London, UK
โจ $160k-$240k / yearest.
full-timesenior Remoteai-ml
๐ Tech Stack
๐ผ About This Role
You'll lead research on audio-visual avatar generation for conversational AI humans. Your mission is to push generative models in diffusion and multimodal modeling to new frontiers. You'll translate cutting-edge research into production and publish at top venues.
๐ฏ What You'll Do
- Lead research on audio-visual generation for avatars (Neural Avatars, Talking-Heads).
- Design models capturing verbal and non-verbal signals in conversation flow.
- Drive innovation in diffusion models, long-video generation, and audio-visual modeling.
- Translate research into production with Applied ML and engineering teams.
๐ Requirements
- PhD or equivalent research experience.
- 2-3+ years hands-on experience with generative models at scale.
- Expertise in diffusion models and efficiency techniques.
- Experience in multimodal generation (video, audio, language).
โจ Nice to Have
- Skills in 3D graphics, Gaussian splatting, or large-scale training.
- Exposure to generative AI models beyond specialty.
- Familiarity with software development best practices.
๐ Benefits & Perks
- ๐ Series A backed by Sequoia, Y Combinator, Scale Venture Partners.
- ๐ Remote within US or Europe considered.
- ๐ข Office in San Francisco (hybrid) or London.
๐จ Hiring Process
Estimated timeline: 3-5 weeks ยท AI estimate
- 1Recruiter Callยท 30 min
- 2Technical Interviewยท 60 min
- 3Research Presentationยท 45 min
- 4Team Interviewยท 45 min
- 5Offerยท 15 min
0 0 0