2h ago
Staff Reliability Engineer
Bengaluru
full-timeseniore-commerce
Tech Stack
+1
Description
You will define and drive the observability strategy, implement monitoring tools, conduct gap analysis, build end-to-end observability, collaborate with teams, and leverage data for insights to ensure stability of IT services.
Requirements
- Expertise in Prometheus, Grafana, Datadog, New Relic, Splunk, or similar
- Experience with infrastructure automation tools like Terraform, Ansible
- Proficiency in scripting languages: Python, Bash, PowerShell
- Experience with cloud platforms: AWS, Azure, GCP
- Knowledge of Kubernetes and Docker for monitoring containerized environments
Responsibilities
- Define and drive observability strategy and roadmap
- Lead implementation and optimization of observability platforms
- Conduct gap assessments and implement automated solutions
- Build end-to-end visibility across infrastructure, network, applications, and user journeys
- Partner with DevOps, SRE, and application teams to embed observability into CI/CD pipelines
0 views 0 saves 0 applications