1d ago

Production Engineer/Site Reliability Engineer (Shift Basis)

Bangalore
full-timemidcybersecurity

Tech Stack

Description

Join a 24/7 Production Operations team to manage critical infrastructure across multi-cloud environments, lead incident response, and drive automation to improve system reliability and uptime.

Requirements

  • Solid understanding of distributed system concepts
  • Experience with production systems in public cloud infrastructures
  • Familiarity with Kubernetes and container orchestration
  • Hands-on experience with Terraform and CloudFormation
  • Proficient in Python, UNIX, networking, and databases like MySQL

Responsibilities

  • Manage and support critical infrastructure in multi-cloud environments
  • Implement observability solutions for monitoring, alerting, and metrics
  • Lead incident management, coordinate resolution across teams
  • Analyze recurring incidents to identify root causes and reduce toil
  • Design automation tools to detect, triage, and remediate production issues
0 views 0 saves 0 applications