2h ago

Senior Site Reliability Engineer

Spain
full-timesenior Remotedocument workflow automation

Tech Stack

+2

Description

In this role, you will own the incident management process, maintain the observability stack, keep production applications running through on-call rotation, develop reliability automations, and collaborate with product engineers to foster SRE principles across the organization.

Requirements

  • Solid programming experience in Python (Django, AsyncIO) or Java (Spring Boot)
  • Experience observability tools suite (Loki, Grafana, Tempo, Mimir)
  • Experience developing and maintaining Python services in production
  • Strong experience with AWS and Kubernetes
  • Proficiency in relational databases (PostgreSQL) and messaging systems (RabbitMQ, NATS, Kafka)

Responsibilities

  • Own and influence incident management process end-to-end
  • Maintain and evolve on-prem observability stack (LGTM)
  • Participate in on-call rotation for production applications
  • Develop automations and tools for platform reliability
  • Collaborate with product engineers to promote SRE principles
0 views 0 saves 0 applications