DevOps / Site Reliability Engineer at Bespoke Labs

2d ago

DevOps / Site Reliability Engineer

Remote

✨ $140k-$180k / yearest.

full-timemid Remoteai-ml

🛠 Tech Stack

💼 About This Role

You'll own and scale cloud infrastructure on AWS, managing Kubernetes clusters and CI/CD pipelines. You'll ensure reliability and performance of production systems supporting AI data pipelines. This role offers direct impact on frontier AI model training infrastructure.

🎯 What You'll Do

Own cloud infrastructure on AWS (EC2, EKS, RDS, S3, IAM, VPC)
Manage Kubernetes clusters and container orchestration
Build and maintain CI/CD pipelines using GitHub Actions
Implement monitoring, alerting, and observability stacks
Automate infrastructure with Terraform or similar IaC tools

📋 Requirements

3–5 years in DevOps, SRE, or infrastructure engineering
Strong AWS experience — EKS, EC2, RDS, S3, IAM
Kubernetes — deployment, scaling, troubleshooting in production
CI/CD pipelines — GitHub Actions, ArgoCD, or similar

✨ Nice to Have

Experience supporting ML training workloads or GPU clusters
Familiarity with distributed computing or large-scale data pipelines
Open-source contributions or published technical writing

🎁 Benefits & Perks

💰 Competitive compensation and meaningful equity
🌍 Remote-friendly environment with low bureaucracy
🏆 Direct impact on frontier AI model training and evaluation
🧠 Small, high-caliber team with deep AI research expertise
📚 Health, wellness, and learning & development benefits

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

1Recruiter Screen· 30 min
2Technical Interview· 60 min
3Hiring Manager· 45 min

Bespoke Labs

Bespoke Labs Jobs

Other jobs at Bespoke Labs

No other jobs found.

0 0 0