2h ago
Senior Site Reliability Engineer
Canada
full-timesenior RemoteCloud Computing / Data Analytics
Tech Stack
Description
You'll lead reliability, availability, and scalability of ClickHouse Cloud infrastructure, collaborate with engineering teams, manage SLOs/SLAs, enhance incident response, and drive chaos engineering initiatives.
Requirements
- Bachelor's or Master's in Computer Science or related field
- At least 8 years of SRE or related experience
- Hands-on with Go and/or Python
- Strong knowledge of AWS, Azure, or GCP
- Experience with Kubernetes and automation tools like Terraform
Responsibilities
- Collaborate to design scalable, secure, highly available systems
- Establish and manage SLOs and SLAs for ClickHouse Cloud
- Ensure monitoring and alerting for all infrastructure components
- Enhance incident response processes and post-mortem analysis
- Plan and drive Chaos initiatives across engineering teams
0 views 0 saves 0 applications