1d ago

Machine Learning Systems & Infrastructure Engineer

London

โœจ $125k-$175k / yearest.

full-timemidai-ml

๐Ÿ›  Tech Stack

+6

๐Ÿ’ผ About This Role

You'll build and own the systems that turn raw real-world data into trained world models and reliable production endpoints for a generative 3D AI company. You will design, implement, and operate scalable training stacks, data ingestion pipelines, and model serving, working closely with the research team in a hands-on, code-heavy role. This is a unique opportunity to shape the infrastructure for next-generation world models.

๐ŸŽฏ What You'll Do

  • Own and evolve ML systems for training, evaluation, and serving of large foundation models.
  • Improve distributed training stacks (PyTorch DDP/FSDP) for performance and stability.
  • Build end-to-end data pipelines for ingestion, preprocessing, and storage at petabyte scale.
  • Operate ML workflow orchestration and model serving platforms (Kubeflow, Airflow, Modal).
  • Manage containerization, IaC (Terraform), and CI/CD for GPU workloads.

๐Ÿ“‹ Requirements

  • 3+ years writing production-quality Python in a large codebase.
  • Hands-on with modern ML training stacks (PyTorch, DDP/FSDP) and debugging distributed jobs.
  • Shipped end-to-end data pipelines at scale with real-world sources.
  • Proficient with containers (Docker, Kubernetes) and IaC (Terraform).

โœจ Nice to Have

  • Experience with ML workflow orchestration (Kubeflow Pipelines, Airflow) and experiment tracking (MLflow).
  • Knowledge of observability tooling (Prometheus/Grafana, OpenTelemetry).

๐ŸŽ Benefits & Perks

  • ๐Ÿ–๏ธ Flexible time off
  • ๐Ÿ’ฐ Equity package
  • ๐Ÿง  Learning budget
  • ๐Ÿข Central London office
  • ๐Ÿฝ๏ธ Daily lunch provided

๐Ÿ“จ Hiring Process

Estimated timeline: 2-3 weeks ยท AI estimate

  1. 1Recruiter Callยท 30 min
  2. 2Technical Interviewยท 60 min
  3. 3Final Roundยท 45 min
0 0 0