23h ago

VLM Research Engineer

Berlin

$150k-$200k / yearest.

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll push the limits of vision-language models for real-world video understanding at an AI-first startup. You'll design and adapt multimodal models and turn them into production pipelines used by customers. This role combines cutting-edge research with applied engineering in a fast-moving team.

🎯 What You'll Do

  • Design and adapt vision-language models for video understanding
  • Build and maintain large-scale training pipelines on GPU clusters
  • Curate and augment video-text and action datasets
  • Develop robust benchmarks for video QA and temporal understanding
  • Deliver production-ready inference pipelines to product teams

📋 Requirements

  • PhD in computer vision, machine learning, or related field
  • Strong background in video-centric deep learning
  • Experience training large vision or VLM models (e.g., InternVL)
  • Proven work with multi-GPU training (PyTorch, distributed)
  • Solid engineering habits: clean Python, reproducible experiments

✨ Nice to Have

  • Publications at top-tier venues (CVPR, ICCV, NeurIPS)
  • Experience with 3D/4D scene representations or action generation
  • Inference optimization: quantization, TensorRT, model distillation

🎁 Benefits & Perks

  • 💰 Competitive salary & stock options
  • 🌍 Collaborative, diverse team with flat hierarchy
  • Flexible working hours
  • 🎯 Real-world impact in manufacturing AI
  • 🤝 Supportive culture promoting underrepresented groups
0 0 0