2h ago
Senior Site Reliability Engineer
Remote (Germany)
full-timesenior RemoteDocument workflow automation
Tech Stack
+2
Description
You'll own incident management, maintain the observability stack, and develop automations to keep production applications reliable. Collaborate with product engineers, mentor teammates, and contribute to codebases to prevent incidents and optimize performance.
Requirements
- Solid programming experience in Python (Django, AsyncIO) and/or Java (Spring Boot)
- Experience maintaining LGTM observability suite
- Strong experience with AWS and Kubernetes
- Proficiency in PostgreSQL and messaging systems (RabbitMQ, NATS, Kafka)
- Hands-on troubleshooting of distributed systems in production
Responsibilities
- Own and influence the incident management process end-to-end
- Maintain and evolve on-prem observability stack (LGTM)
- Participate in on-call rotation for production applications
- Develop automations and tools for platform reliability
- Collaborate with product engineers to foster SRE principles
0 views 0 saves 0 applications