Machine Learning & Cloud Infra Engineer at SpAItial

15h ago

Machine Learning & Cloud Infra Engineer

London

✨ $150k-$200k / yearest.

full-timemidai-ml

🛠 Tech Stack

💼 About This Role

You'll build and own the infrastructure for training massive generative 3D models at SpAItial. You'll design GPU clusters, distributed training systems, and storage pipelines that enable researchers to train world-scale models efficiently. This role combines deep systems engineering with direct impact on cutting-edge AI research.

🎯 What You'll Do

Own and evolve ML + cloud infrastructure for training massive foundation models
Design and operate GPU clusters with scheduling and capacity planning
Support distributed training stacks (PyTorch DDP/FSDP) for performance and stability
Build and optimize storage systems for petabyte-scale datasets
Package and deploy workloads with Docker, Kubernetes, and Terraform

📋 Requirements

3+ years of professional experience in infrastructure, platform, or cloud engineering
Hands-on experience with GPU compute and performance debugging (CUDA/NCCL)
Strong experience operating cloud environments (AWS, GCP, or Azure)
Proficiency with containers and orchestration (Docker, Kubernetes) and infrastructure-as-code (Terraform)

✨ Nice to Have

ML infrastructure experience
Experience with monitoring and observability tooling (Prometheus/Grafana, ELK)
Experience building CI/CD for infra and ML workflows (CircleCI, GitHub Actions)

🎁 Benefits & Perks

🏖️ Flexible PTO
💰 Equity
🏥 Health Insurance
📚 Learning Budget

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

1Recruiter Screen· 30 min
2Technical Interview· 60 min
3Team Interview· 45 min

SpAItial

SpAItial Jobs

Other jobs at SpAItial

No other jobs found.

0 0 0