6h ago
Senior Site Reliability Engineer
Australia(Remote)
full-timesenior RemoteCloud Computing / Real-time Analytics
Tech Stack
Description
You will build and lead processes to ensure the reliability, availability, scalability, and performance of ClickHouse Cloud infrastructure. Collaborate with engineering teams to design scalable and fault-tolerant distributed systems, manage incident response and post-mortem analysis, and develop software platforms to optimize operational efficiency.
Requirements
- 8+ years of experience in Site Reliability Engineering or related field
- Hands-on experience with Go and/or Python
- Strong knowledge of cloud computing platforms (AWS, Azure, GCP)
- Hands-on experience with Kubernetes or Docker Swarm
- Experience with automation tools like Ansible, Terraform, or Puppet
Responsibilities
- Collaborate with engineering teams to design and implement scalable, secure, and highly available systems
- Establish and manage SLOs and SLAs for ClickHouse Cloud
- Ensure monitoring and alerting for all infrastructure components
- Enhance incident response processes and conduct post-mortem analysis
- Drive chaos engineering initiatives and manage on-call processes
0 views 0 saves 0 applications