1d ago

Member of Technical Staff - ML Infrastructure Engineer

Freiburg, Germany; San Francisco, USA

$180k-$300k / year

full-timesenior Hybridai-ml

🛠 Tech Stack

+1

💼 About This Role

You'll design, deploy, and maintain the ML infrastructure backbone for frontier AI research at a team behind Latent Diffusion and Stable Diffusion. Your work directly impacts multi-week training runs and production inference. We offer a collaborative, low-ego culture with real offices in Freiburg and SF.

🎯 What You'll Do

  • Design and deploy cloud-based ML training clusters (Slurm) and inference clusters (Kubernetes)
  • Manage network-based cloud file systems and blob/S3 storage for ML workloads
  • Develop and maintain Infrastructure as Code (IaC) for resource provisioning
  • Implement CI/CD pipelines for ML workflows and custom autoscaling solutions

📋 Requirements

  • Strong proficiency in cloud platforms (AWS, Azure, or GCP) with ML/AI services
  • Extensive experience with Kubernetes and Slurm in production environments
  • Expertise in Infrastructure as Code tools (Terraform, Ansible)
  • Proven track record managing network-based cloud file systems and object storage for ML

✨ Nice to Have

  • Experience building custom autoscaling solutions for ML workloads
  • Knowledge of cost optimization strategies for cloud-based ML infrastructure
  • Familiarity with MLOps practices and tools

🎁 Benefits & Perks

  • 🏢 Full-time in-person collaboration with offices in Freiburg and SF
  • 🏖️ Monthly in-person week for remote team members, with travel covered
  • 💰 Base annual salary $180,000–$300,000 USD
  • ⚛️ Frontier AI research with open science values

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

  1. 1Recruiter Screen· 30 min
  2. 2Technical Interview· 60 min
  3. 3On-site / Final Round· 3-4 hours
0 0 0