21h ago

Senior Site Reliability Engineer

Remote - Canada
full-timesenior Remotesoftware

🛠 Tech Stack

💼 About This Role

You'll build and operate multi-region AWS infrastructure for high-fidelity virtual environments used to validate autonomous systems. Your core impact is ensuring reliability and scalability of distributed simulation workloads at a startup. This role offers broad ownership across infrastructure, incident response, and security.

🎯 What You'll Do

  • Design and maintain multi-region AWS infrastructure with Terraform.
  • Operate and scale EKS clusters across production regions.
  • Lead incident response, debugging, and root-cause analysis.
  • Improve CI/CD pipelines and infrastructure validation tools.

📋 Requirements

  • 5+ years in SRE, DevOps, or infrastructure engineering.
  • Terraform modules, state management, multi-environment patterns.
  • AWS depth across VPC, IAM, EKS, and S3.
  • Kubernetes cluster operations, autoscaling, and Helm.

✨ Nice to Have

  • Windows on Kubernetes experience.
  • GPU scheduling on Kubernetes.
  • Simulation, ML, or rendering workload support.

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

  1. 1Recruiter Screen· 30 min
  2. 2Technical Interview· 60 min
  3. 3Hiring Manager· 45 min
0 0 0