2h ago
Senior Site Reliability Engineer
Spain
full-timesenior Remotedocument workflow automation
Tech Stack
+2
Description
In this role, you will own the incident management process, maintain the observability stack, keep production applications running through on-call rotation, develop reliability automations, and collaborate with product engineers to foster SRE principles across the organization.
Requirements
- Solid programming experience in Python (Django, AsyncIO) or Java (Spring Boot)
- Experience observability tools suite (Loki, Grafana, Tempo, Mimir)
- Experience developing and maintaining Python services in production
- Strong experience with AWS and Kubernetes
- Proficiency in relational databases (PostgreSQL) and messaging systems (RabbitMQ, NATS, Kafka)
Responsibilities
- Own and influence incident management process end-to-end
- Maintain and evolve on-prem observability stack (LGTM)
- Participate in on-call rotation for production applications
- Develop automations and tools for platform reliability
- Collaborate with product engineers to promote SRE principles
0 views 0 saves 0 applications