about 4 hours ago
Senior Site Reliability Engineer
Austin, Texas, United States
full-timeseniorvideo games
Tech Stack
+2
Description
You will architect and evolve the enterprise-wide observability platform to provide deep visibility into infrastructure and application performance for 2K's game platform and studios. You will design monitoring solutions, implement automation, and partner with development and operations teams to ensure reliability and performance.
Requirements
- 5+ years professional experience in IT, including 3+ years in observability, monitoring, or SRE engineering
- Deep knowledge of monitoring toolsets such as Prometheus, Grafana, ELK, Splunk, Dynatrace, Datadog
- Proficiency in Python for automation and tool development
- Hands-on experience with Kubernetes, Docker, and cloud platforms (AWS, GCP, or Azure)
- Strong understanding of networking, infrastructure, and performance optimization
- Experience with IaC tools such as Terraform
- Familiarity with configuration management tools (Ansible, Chef, Puppet) and CI/CD integration
- Proven track record designing dashboards, alerts, and performance reports
- Excellent communication skills
Responsibilities
- Architect, develop, and evolve enterprise-wide observability platform
- Design and implement monitoring solutions with modern metrics and visualization technologies
- Collaborate with application and infrastructure teams to define observability standards and best practices
- Implement automation for monitoring configurations using IaC tools
- Integrate observability standards to unify metrics, logs, and traces
- Drive cost optimization initiatives around monitoring and logging
- Partner with developers and operations teams to enable self-service observability capabilities
- Create automation and alerting processes to proactively identify performance issues
- Participate in architectural reviews to embed observability into new services
- Contribute to documentation and knowledge sharing
- Deliver reports and visualizations for technical and business stakeholders
- Evaluate emerging technologies to evolve observability strategy
- Drive automation and process improvements for system performance and resiliency
- Select and integrate monitoring and telemetry tools
0 views 0 saves 0 applications