2h ago

Senior Site Reliability Engineer

Ukraine
full-timesenior Remotesoftware

Tech Stack

+2

Description

You will own and influence the incident management process, maintain the observability stack, and ensure production reliability at PandaDoc. Your work will involve developing automations, contributing to service codebases, and mentoring others to foster SRE principles across the organization.

Requirements

  • Solid programming experience in Python (Django, AsyncIO) and/or Java (Spring Boot)
  • Experience maintaining observability tools suite (LGTM: Loki, Grafana, Tempo, Mimir)
  • Experience developing and maintaining Python services in production
  • Strong experience with AWS and Kubernetes
  • Proficiency in relational databases (PostgreSQL) and messaging systems (RabbitMQ, NATS, Kafka)

Responsibilities

  • Own and influence the incident management process end-to-end
  • Maintain and evolve on-prem observability stack
  • Keep production applications running smoothly by participating in on-call rotation
  • Develop automations and tools to support platform reliability
  • Mentor SRE team members and product engineers
0 views 0 saves 0 applications