2h ago

Site Reliability Engineer

Ciudad de México, Mexico

$100k-$150k / yearest.

full-timemid RemoteFintech / Payments

🛠 Tech Stack

💼 About This Role

You'll ensure the reliability, scalability, and performance of our AWS-based platform by integrating observability and SRE best practices across the software lifecycle. You'll work closely with development teams to improve uptime and provide observability tooling.

🎯 What You'll Do

  • Design and maintain observability and monitoring for AWS infrastructure.
  • Define and track SLIs, SLOs, and SLAs for critical systems.
  • Provide internal tools for developers for diagnostics and debugging.
  • Manage scaling, performance, and resilience efforts.
  • Conduct disaster recovery testing and improve deployment strategies.

📋 Requirements

  • Expertise with Prometheus, Grafana, or OpenTelemetry.
  • Experience designing dashboards, alerts, and log aggregation pipelines.
  • Deep understanding of AWS services: ECS, Lambda, RDS, CodePipeline.
  • Strong proficiency in Go programming language.
  • Experience defining SLIs, SLOs, error budgets.

✨ Nice to Have

  • Experience with Chaos Monkey or Gremlin for failure drills.
  • Prior experience in a fast-paced startup environment.

🎁 Benefits & Perks

  • 🌐 Remote work
  • 🚀 Join a rapidly growing startup
0 0 0