5h ago
Research Scientist - Audiovisual Understanding, Model Foundations
Jerusalem
$150,000-$250,000 / year
L
full-timeseniorArtificial Intelligence
🛠 Tech Stack
💼 About This Role
You'll improve video generation quality and efficiency by enhancing video and audio understanding pipelines for training data construction and model evaluation. You'll fine-tune large-scale Video Language Models (VLLMs) and implement classic computer vision and signal processing algorithms. This role offers hands-on work with petabyte-scale datasets and distributed systems.
🎯 What You'll Do
- Fine-tune and control VLLMs for video and audio understanding.
- Design algorithms for balancing, filtering, and curating training datasets.
- Implement algorithms for processing, clustering, and filtering large-scale datasets.
- Work within distributed systems handling petabytes of data.
📋 Requirements
- Experience training or fine-tuning VLLMs or multimodal foundation models.
- Strong software engineering skills in Jax or PyTorch.
- Ability to develop and implement computer vision models for data filtering.
- Understanding of statistics and clustering topics.
0 0 0