5h ago
Operations Engineer, Fleet Reliability
Poland
full-timemidcloud computing
Tech Stack
Description
You will drive server nodes through provisioning and validation processes, troubleshooting hardware and software issues to maximize uptime of high-performance supercomputing clusters. This role involves configuring and maintaining large-scale GPU clusters, working shifts from 7 am to 9 pm, and participating in on-call rotations. Onboarding training at US headquarters is required within the first month.
Requirements
- 2+ years experience in data center or on-prem infrastructure
- Strong Linux system administration and networking knowledge
- Ability to troubleshoot hardware and software issues
- Bachelor's degree or equivalent experience
- Ability to travel to US on short notice (ESTA or B-1 visa)
Responsibilities
- Provision and validate batches of server nodes
- Troubleshoot node and cluster issues efficiently
- Configure and maintain large-scale GPU clusters
- Perform system maintenance tasks reliably
- Participate in on-call rotations including after-hours and weekends
0 views 0 saves 0 applications