3h ago

ML Systems Engineer, ML Acceleration

Singapore, Central, Singapore
full-timemidautonomous vehicles

Tech Stack

Description

You will optimize the core systems that enable researchers to train frontier models at scale, focusing on speed, cost, reliability, and throughput. Your work will directly impact large-scale distributed model training and reduce time-to-convergence for next-generation models.

Requirements

  • Bachelor's, Master's, or PhD in Computer Science or related field
  • Strong proficiency in Python
  • Extensive hands-on experience with PyTorch
  • Experience optimizing ML model execution during training and inference
  • Exceptional analytical and problem-solving skills with a data-driven approach

Responsibilities

  • Performance profiling and optimization using tools like Nsight and PyTorch Profiler
  • Optimize distributed training pipelines with PyTorch Distributed
  • Design and maintain high-performance GPU kernels in Triton or CUDA
  • Optimize robust data loading pipelines for maximum training throughput
0 views 0 saves 0 applications