2h ago

Senior Site Reliability Engineer

United States

$141,000-$230,000 / year

full-timesenior Remotecloud computing

Tech Stack

Description

You will build and lead processes to ensure the reliability, availability, scalability, and performance of ClickHouse Cloud infrastructure. You'll collaborate with engineering teams to design fault-tolerant systems, manage incident response and post-mortem analysis, and develop software tools to optimize operational efficiencies.

Requirements

  • Bachelor's or Master's degree in Computer Science or related field
  • At least 8 years of experience in Site Reliability Engineering or related field
  • Hands-on experience with Go and/or Python
  • Strong knowledge of cloud computing platforms such as AWS, Azure, or GCP
  • Experience with container orchestration tools like Kubernetes or Docker Swarm

Responsibilities

  • Collaborate with engineering teams to design and implement scalable, secure, and highly available systems
  • Establish and manage SLOs and SLAs for ClickHouse Cloud
  • Ensure infrastructure components have monitoring and alerting for timely incident detection and resolution
  • Enhance incident response processes and post-mortem analysis
  • Plan and drive Chaos initiatives across engineering teams
0 views 0 saves 0 applications