21h ago
Senior Site Reliability Engineer
Remote - Canada
full-timesenior Remotesoftware
🛠 Tech Stack
💼 About This Role
You'll build and operate multi-region AWS infrastructure for high-fidelity virtual environments used to validate autonomous systems. Your core impact is ensuring reliability and scalability of distributed simulation workloads at a startup. This role offers broad ownership across infrastructure, incident response, and security.
🎯 What You'll Do
- Design and maintain multi-region AWS infrastructure with Terraform.
- Operate and scale EKS clusters across production regions.
- Lead incident response, debugging, and root-cause analysis.
- Improve CI/CD pipelines and infrastructure validation tools.
📋 Requirements
- 5+ years in SRE, DevOps, or infrastructure engineering.
- Terraform modules, state management, multi-environment patterns.
- AWS depth across VPC, IAM, EKS, and S3.
- Kubernetes cluster operations, autoscaling, and Helm.
✨ Nice to Have
- Windows on Kubernetes experience.
- GPU scheduling on Kubernetes.
- Simulation, ML, or rendering workload support.
📨 Hiring Process
Estimated timeline: 2-4 weeks · AI estimate
- 1Recruiter Screen· 30 min
- 2Technical Interview· 60 min
- 3Hiring Manager· 45 min
0 0 0