5h ago

Staff Site Reliability Engineer

Remote (United States)

$220k-$325k / year

full-timesenior Remotesoftware

🛠 Tech Stack

💼 About This Role

You'll join the SRE team to ensure reliability, scalability, and performance of Replit's infrastructure serving millions of developers. You'll proactively find reliability problems, implement observability solutions, and drive automation for step-function improvements. You'll also mentor engineers and make reliability a core value across the company.

🎯 What You'll Do

  • Architect and implement comprehensive monitoring, logging, and tracing solutions.
  • Define and track SLOs/SLIs with product and engineering teams.
  • Lead incident management and conduct blameless post-mortems.
  • Drive automation and infrastructure as code using Terraform or Pulumi.

📋 Requirements

  • 8-10 years of experience in Site Reliability Engineering or similar roles.
  • Strong programming skills in Python or Go.
  • Deep understanding of distributed systems and production services.
  • Deep experience with Kubernetes and container orchestration.

✨ Nice to Have

  • Deep experience with Google Cloud Platform (GCP) services and tools.
  • Expert-level knowledge of modern observability platforms (e.g., Prometheus, Grafana, Datadog, OpenTelemetry).
  • Significant experience with Go and Terraform.

🎁 Benefits & Perks

  • 💰 Competitive Salary & Equity
  • 💹 401(k) Program with 4% match
  • ⚕️ Health, Dental, Vision and Life Insurance
  • 🚼 Paid Parental, Medical, Caregiver Leave
  • 🏖️ Autonomous Work Environment
0 0 0