6h ago

Senior Site Reliability Engineer

Australia(Remote)
full-timesenior RemoteCloud Computing / Real-time Analytics

Tech Stack

Description

You will build and lead processes to ensure the reliability, availability, scalability, and performance of ClickHouse Cloud infrastructure. Collaborate with engineering teams to design scalable and fault-tolerant distributed systems, manage incident response and post-mortem analysis, and develop software platforms to optimize operational efficiency.

Requirements

  • 8+ years of experience in Site Reliability Engineering or related field
  • Hands-on experience with Go and/or Python
  • Strong knowledge of cloud computing platforms (AWS, Azure, GCP)
  • Hands-on experience with Kubernetes or Docker Swarm
  • Experience with automation tools like Ansible, Terraform, or Puppet

Responsibilities

  • Collaborate with engineering teams to design and implement scalable, secure, and highly available systems
  • Establish and manage SLOs and SLAs for ClickHouse Cloud
  • Ensure monitoring and alerting for all infrastructure components
  • Enhance incident response processes and conduct post-mortem analysis
  • Drive chaos engineering initiatives and manage on-call processes
0 views 0 saves 0 applications