2h ago
Senior Site Reliability Engineer
New York City
full-timeseniorfinancial technology
Tech Stack
Description
You will design and build automated systems to manage infrastructure at scale, reduce operational toil, and build internal platforms that enable self-service changes. Your focus will be on improving reliability and resilience of Kubernetes clusters, databases, and services, while contributing to architecture decisions and participating in on-call rotations with a proactive incident prevention mindset.
Requirements
- 5+ years experience in infrastructure, SRE, or software engineering
- Strong software engineering skills building systems, not just scripts
- Experience managing production infrastructure at scale (cloud + containerized)
- Experience with Infrastructure as Code (e.g., Terraform)
- Experience with distributed systems (Docker/Kubernetes) and observability tools (Datadog, CloudWatch, ELK)
Responsibilities
- Design and build systems to automate infrastructure management at scale
- Reduce operational toil by turning manual processes into reliable workflows
- Build internal tooling and platforms for safe self-service changes
- Improve reliability and resilience of Kubernetes, databases, and services
- Implement and evolve systems for deploying and running applications in Kubernetes
0 views 0 saves 0 applications