2h ago

Staff Reliability Engineer

Bengaluru
full-timeseniore-commerce

Tech Stack

+1

Description

You will define and drive the observability strategy, implement monitoring tools, conduct gap analysis, build end-to-end observability, collaborate with teams, and leverage data for insights to ensure stability of IT services.

Requirements

  • Expertise in Prometheus, Grafana, Datadog, New Relic, Splunk, or similar
  • Experience with infrastructure automation tools like Terraform, Ansible
  • Proficiency in scripting languages: Python, Bash, PowerShell
  • Experience with cloud platforms: AWS, Azure, GCP
  • Knowledge of Kubernetes and Docker for monitoring containerized environments

Responsibilities

  • Define and drive observability strategy and roadmap
  • Lead implementation and optimization of observability platforms
  • Conduct gap assessments and implement automated solutions
  • Build end-to-end visibility across infrastructure, network, applications, and user journeys
  • Partner with DevOps, SRE, and application teams to embed observability into CI/CD pipelines
0 views 0 saves 0 applications