2h ago
HPC System Engineer
Amsterdam, Netherlands
full-timemidCloud Computing / AI Infrastructure
Tech Stack
Description
You will benchmark GPU platforms for ML and AI workloads, profiling GPU performance at system and kernel level, comparing platforms across architectures like CUDA and ROCm, and performing acceptance testing for new GPU clusters. Your work will enable data-driven decisions for platform optimization and next-gen hardware development.
Requirements
- Proficient in Unix/Linux, Python, and Bash for automation
- Good understanding of GPU stack: CUDA, NCCL, drivers, and relevant libraries
- Proven ability to troubleshoot complex system issues (hardware, software, networking)
- Familiarity with containerized environments (Docker, Kubernetes)
Responsibilities
- Profile and analyze GPU performance at system and kernel level
- Evaluate and compare GPU performance across different platforms and software stacks (e.g., CUDA, ROCm)
- Perform acceptance testing for new GPU clusters for performance, stability, and compatibility
- Assess impact of interconnect strategies and system-level optimizations on performance and scalability
0 views 0 saves 0 applications