2h ago

Senior Site Reliability Engineer

Remote (Germany)
full-timesenior RemoteDocument workflow automation

Tech Stack

+2

Description

You'll own incident management, maintain the observability stack, and develop automations to keep production applications reliable. Collaborate with product engineers, mentor teammates, and contribute to codebases to prevent incidents and optimize performance.

Requirements

  • Solid programming experience in Python (Django, AsyncIO) and/or Java (Spring Boot)
  • Experience maintaining LGTM observability suite
  • Strong experience with AWS and Kubernetes
  • Proficiency in PostgreSQL and messaging systems (RabbitMQ, NATS, Kafka)
  • Hands-on troubleshooting of distributed systems in production

Responsibilities

  • Own and influence the incident management process end-to-end
  • Maintain and evolve on-prem observability stack (LGTM)
  • Participate in on-call rotation for production applications
  • Develop automations and tools for platform reliability
  • Collaborate with product engineers to foster SRE principles
0 views 0 saves 0 applications