5h ago

Tech Lead, AI Compute Infrastructure

Los Angeles, Palo Alto, San Francisco, Toronto, Singapore
full-timeseniorArtificial Intelligence

Tech Stack

Description

You will build and scale the compute infrastructure powering HeyGen's state-of-the-art AI models, directly impacting model performance and video generation quality. You'll optimize GPU utilization across thousands of devices, develop scalable frameworks for compute jobs, and collaborate closely with AI researchers.

Requirements

  • Bachelor's degree in CS or related field, or equivalent experience
  • 5+ years of industry experience in large-scale MLOps, AI infrastructure, or HPC
  • Experience with data frameworks like Ray, Apache Spark, LanceDB
  • Proficiency in Python and C++
  • Deep experience with Kubernetes and Ray

Responsibilities

  • Optimize GPU utilization across thousands of devices for inference, training, and data processing
  • Build scalable frameworks for managing heterogeneous compute jobs including data ingestion, training, and evaluation
  • Develop observability and tracing tools for compute clusters to diagnose performance bottlenecks
  • Collaborate with AI researchers to integrate acceleration techniques into production pipelines
  • Champion cloud and container tech (Kubernetes, Ray) for elastic scaling of distributed systems
0 views 0 saves 0 applications