4h ago

Production Engineer/Site Reliability Engineer (Shift Basis)

Bangalore, India
full-timemidData Security / Cloud Computing

Tech Stack

Description

You'll join a 24/7 Production Operations team responsible for managing and supporting critical infrastructure in multi-cloud environments. Lead incident management, design automation tools, and implement observability solutions to ensure maximum uptime and reliability.

Requirements

  • Solid understanding of distributed system concepts
  • Hands-on experience with Kubernetes and infrastructure management tools like Terraform
  • Strong analytical and problem-solving skills for diagnosing system issues
  • Proficiency in Python programming
  • Knowledge of data structures, algorithms, UNIX, networking, OS, and databases like MySQL

Responsibilities

  • Manage and support critical infrastructure and services in multi-cloud environments
  • Implement and maintain observability solutions for monitoring, alerting, and metrics collection
  • Lead incident management: respond to alerts and outages, coordinate teams for resolution
  • Analyze incidents to identify root causes and improve system resilience
  • Design and develop automation tools to detect, triage, and remediate production issues
0 views 0 saves 0 applications