2h ago
Site Reliability Engineer
Ciudad de México, Mexico
✨ $100k-$150k / yearest.
full-timemid RemoteFintech / Payments
🛠 Tech Stack
💼 About This Role
You'll ensure the reliability, scalability, and performance of our AWS-based platform by integrating observability and SRE best practices across the software lifecycle. You'll work closely with development teams to improve uptime and provide observability tooling.
🎯 What You'll Do
- Design and maintain observability and monitoring for AWS infrastructure.
- Define and track SLIs, SLOs, and SLAs for critical systems.
- Provide internal tools for developers for diagnostics and debugging.
- Manage scaling, performance, and resilience efforts.
- Conduct disaster recovery testing and improve deployment strategies.
📋 Requirements
- Expertise with Prometheus, Grafana, or OpenTelemetry.
- Experience designing dashboards, alerts, and log aggregation pipelines.
- Deep understanding of AWS services: ECS, Lambda, RDS, CodePipeline.
- Strong proficiency in Go programming language.
- Experience defining SLIs, SLOs, error budgets.
✨ Nice to Have
- Experience with Chaos Monkey or Gremlin for failure drills.
- Prior experience in a fast-paced startup environment.
🎁 Benefits & Perks
- 🌐 Remote work
- 🚀 Join a rapidly growing startup
0 0 0