5h ago

Research Scientist - Audiovisual Understanding, Model Foundations

Jerusalem

$150,000-$250,000 / year

L
full-timeseniorArtificial Intelligence

🛠 Tech Stack

💼 About This Role

You'll improve video generation quality and efficiency by enhancing video and audio understanding pipelines for training data construction and model evaluation. You'll fine-tune large-scale Video Language Models (VLLMs) and implement classic computer vision and signal processing algorithms. This role offers hands-on work with petabyte-scale datasets and distributed systems.

🎯 What You'll Do

  • Fine-tune and control VLLMs for video and audio understanding.
  • Design algorithms for balancing, filtering, and curating training datasets.
  • Implement algorithms for processing, clustering, and filtering large-scale datasets.
  • Work within distributed systems handling petabytes of data.

📋 Requirements

  • Experience training or fine-tuning VLLMs or multimodal foundation models.
  • Strong software engineering skills in Jax or PyTorch.
  • Ability to develop and implement computer vision models for data filtering.
  • Understanding of statistics and clustering topics.
0 0 0