1d ago
Member of Technical Staff - ML Infrastructure Engineer
Freiburg, Germany; San Francisco, USA
$180k-$300k / year
full-timesenior Hybridai-ml
🛠 Tech Stack
+1
💼 About This Role
You'll design, deploy, and maintain the ML infrastructure backbone for frontier AI research at a team behind Latent Diffusion and Stable Diffusion. Your work directly impacts multi-week training runs and production inference. We offer a collaborative, low-ego culture with real offices in Freiburg and SF.
🎯 What You'll Do
- Design and deploy cloud-based ML training clusters (Slurm) and inference clusters (Kubernetes)
- Manage network-based cloud file systems and blob/S3 storage for ML workloads
- Develop and maintain Infrastructure as Code (IaC) for resource provisioning
- Implement CI/CD pipelines for ML workflows and custom autoscaling solutions
📋 Requirements
- Strong proficiency in cloud platforms (AWS, Azure, or GCP) with ML/AI services
- Extensive experience with Kubernetes and Slurm in production environments
- Expertise in Infrastructure as Code tools (Terraform, Ansible)
- Proven track record managing network-based cloud file systems and object storage for ML
✨ Nice to Have
- Experience building custom autoscaling solutions for ML workloads
- Knowledge of cost optimization strategies for cloud-based ML infrastructure
- Familiarity with MLOps practices and tools
🎁 Benefits & Perks
- 🏢 Full-time in-person collaboration with offices in Freiburg and SF
- 🏖️ Monthly in-person week for remote team members, with travel covered
- 💰 Base annual salary $180,000–$300,000 USD
- ⚛️ Frontier AI research with open science values
📨 Hiring Process
Estimated timeline: 2-4 weeks · AI estimate
- 1Recruiter Screen· 30 min
- 2Technical Interview· 60 min
- 3On-site / Final Round· 3-4 hours
0 0 0