2h ago
Senior Site Reliability Engineer
United States
$141,000-$230,000 / year
full-timesenior Remotecloud computing
Tech Stack
Description
You will build and lead processes to ensure the reliability, availability, scalability, and performance of ClickHouse Cloud infrastructure. You'll collaborate with engineering teams to design fault-tolerant systems, manage incident response and post-mortem analysis, and develop software tools to optimize operational efficiencies.
Requirements
- Bachelor's or Master's degree in Computer Science or related field
- At least 8 years of experience in Site Reliability Engineering or related field
- Hands-on experience with Go and/or Python
- Strong knowledge of cloud computing platforms such as AWS, Azure, or GCP
- Experience with container orchestration tools like Kubernetes or Docker Swarm
Responsibilities
- Collaborate with engineering teams to design and implement scalable, secure, and highly available systems
- Establish and manage SLOs and SLAs for ClickHouse Cloud
- Ensure infrastructure components have monitoring and alerting for timely incident detection and resolution
- Enhance incident response processes and post-mortem analysis
- Plan and drive Chaos initiatives across engineering teams
0 views 0 saves 0 applications