2h ago

Senior Site Reliability Engineer

Canada
full-timesenior RemoteCloud Computing / Data Analytics

Tech Stack

Description

You'll lead reliability, availability, and scalability of ClickHouse Cloud infrastructure, collaborate with engineering teams, manage SLOs/SLAs, enhance incident response, and drive chaos engineering initiatives.

Requirements

  • Bachelor's or Master's in Computer Science or related field
  • At least 8 years of SRE or related experience
  • Hands-on with Go and/or Python
  • Strong knowledge of AWS, Azure, or GCP
  • Experience with Kubernetes and automation tools like Terraform

Responsibilities

  • Collaborate to design scalable, secure, highly available systems
  • Establish and manage SLOs and SLAs for ClickHouse Cloud
  • Ensure monitoring and alerting for all infrastructure components
  • Enhance incident response processes and post-mortem analysis
  • Plan and drive Chaos initiatives across engineering teams
0 views 0 saves 0 applications