2h ago

Senior Site Reliability Engineer

New York City
full-timeseniorfinancial technology

Tech Stack

Description

You will design and build automated systems to manage infrastructure at scale, reduce operational toil, and build internal platforms that enable self-service changes. Your focus will be on improving reliability and resilience of Kubernetes clusters, databases, and services, while contributing to architecture decisions and participating in on-call rotations with a proactive incident prevention mindset.

Requirements

  • 5+ years experience in infrastructure, SRE, or software engineering
  • Strong software engineering skills building systems, not just scripts
  • Experience managing production infrastructure at scale (cloud + containerized)
  • Experience with Infrastructure as Code (e.g., Terraform)
  • Experience with distributed systems (Docker/Kubernetes) and observability tools (Datadog, CloudWatch, ELK)

Responsibilities

  • Design and build systems to automate infrastructure management at scale
  • Reduce operational toil by turning manual processes into reliable workflows
  • Build internal tooling and platforms for safe self-service changes
  • Improve reliability and resilience of Kubernetes, databases, and services
  • Implement and evolve systems for deploying and running applications in Kubernetes
0 views 0 saves 0 applications