4h ago
Production Engineer/Site Reliability Engineer (Shift Basis)
Bangalore, India
full-timemidData Security / Cloud Computing
Tech Stack
Description
You'll join a 24/7 Production Operations team responsible for managing and supporting critical infrastructure in multi-cloud environments. Lead incident management, design automation tools, and implement observability solutions to ensure maximum uptime and reliability.
Requirements
- Solid understanding of distributed system concepts
- Hands-on experience with Kubernetes and infrastructure management tools like Terraform
- Strong analytical and problem-solving skills for diagnosing system issues
- Proficiency in Python programming
- Knowledge of data structures, algorithms, UNIX, networking, OS, and databases like MySQL
Responsibilities
- Manage and support critical infrastructure and services in multi-cloud environments
- Implement and maintain observability solutions for monitoring, alerting, and metrics collection
- Lead incident management: respond to alerts and outages, coordinate teams for resolution
- Analyze incidents to identify root causes and improve system resilience
- Design and develop automation tools to detect, triage, and remediate production issues
0 views 0 saves 0 applications