3h ago
Senior Reliability Engineer
Mexico City
β¨ $130k-$170k / yearest.
full-timesenior Remotesoftware
π Tech Stack
πΌ About This Role
You'll operate, observe, and improve the reliability of distributed systems on AWS and Kubernetes. You'll focus on understanding production behavior, detecting issues, and enabling automated scaling and recovery. This role emphasizes observability and operational maturity over greenfield infrastructure.
π― What You'll Do
- Design and improve observability strategies including metrics, logs, and traces.
- Identify failure modes, performance bottlenecks, and reliability risks.
- Evolve shared AWS CDK and CDK8s constructs for observability and autoscaling.
- Collaborate with teams on incident investigation and root cause analysis.
π Requirements
- 5+ years experience in Site Reliability Engineering or Platform Engineering.
- Strong experience with observability operations (metrics, logs, traces, alerts).
- Hands-on experience with AWS services (VPC, IAM, RDS, MSK, S3, CloudWatch).
- Python fluency and experience with AWS CDK/CDK8s or equivalent IaC.
β¨ Nice to Have
- Experience with Spark on Kubernetes, Argo, or Kafka-based batch pipelines.
π Benefits & Perks
- π 100% Remote Work
- π° Highly Competitive USD Pay
- ποΈ Paid Time Off
- π Work with Autonomy
- πΊπΈ Work with Top American Companies
0 0 0