9h ago

Senior Applied Scientist - Multimodal

London

$130k-$180k / yearest.

full-timesenior Hybridmedia

🛠 Tech Stack

💼 About This Role

You'll develop and scale audio/video dataset generation and multimodal pipelines for lip sync models at Flawless, an AI company transforming Hollywood. Your work will directly improve model reliability and reduce production iteration time, enabling creators to reach global audiences. You'll operate at the intersection of research and production, bringing automation and rigor to model validation.

🎯 What You'll Do

  • Develop scalable audio/video dataset curation and lip sync training pipelines
  • Design and automate evaluation metrics for audio/video and lip sync quality
  • Collaborate with researchers to validate model improvements and support releases

📋 Requirements

  • MSc or PhD with industry experience in audio processing, 3D computer vision, or related multimodal fields
  • Proficiency in Python with strong computer science fundamentals
  • Expertise in deep learning frameworks (PyTorch) and vision tools (OpenCV)
  • Experience with audio-visual learning, multimodal fusion, or audio-driven face animation

✨ Nice to Have

  • Strong publication record at major venues (CVPR, SIGGRAPH, NeurIPS)
  • Experience developing multi-modal systems integrating audio, text, and visuals
  • Experience with generative and cross-domain attention models for audio/visual applications

🎁 Benefits & Perks

  • 🏖️ Hybrid working environment
  • 💰 Competitive Salary
  • 📈 Generous stock options for permanent employees
  • 🤝 Autonomy and collaborative culture

📨 Hiring Process

Three rounds: Recruiting Screen, Hiring Manager Screen, Skills Interview with take-home task, and a 2-hour onsite Team Interview.

0 0 0