2h ago
Senior Site Reliability Engineer
Singapore
full-timesenior RemoteCloud Computing / Real-time Analytics
Tech Stack
Description
You will join ClickHouse's central SRE team to ensure the reliability, availability, scalability, and performance of our cloud infrastructure. Collaborating with multiple engineering teams, you'll design scalable systems, manage SLOs/SLAs, improve incident response, and drive chaos initiatives to continuously enhance service reliability.
Requirements
- Bachelor's or Master's degree in Computer Science or related field
- At least 8 years of experience in Site Reliability Engineering or related field
- Hands-on experience with Go and/or Python
- Strong knowledge of cloud computing platforms (AWS, Azure, or GCP)
- Experience with container orchestration (Kubernetes or Docker Swarm)
Responsibilities
- Design and implement scalable, secure, and highly available systems with engineering teams
- Establish and manage SLOs and SLAs for ClickHouse Cloud
- Ensure monitoring and alerting for all infrastructure components
- Enhance incident response processes and post-mortem analysis
- Plan and drive Chaos initiatives across engineering teams
0 views 0 saves 0 applications