5h ago
Tech Lead, AI Compute Infrastructure
Los Angeles, Palo Alto, San Francisco, Toronto, Singapore
full-timeseniorArtificial Intelligence
Tech Stack
Description
You will build and scale the compute infrastructure powering HeyGen's state-of-the-art AI models, directly impacting model performance and video generation quality. You'll optimize GPU utilization across thousands of devices, develop scalable frameworks for compute jobs, and collaborate closely with AI researchers.
Requirements
- Bachelor's degree in CS or related field, or equivalent experience
- 5+ years of industry experience in large-scale MLOps, AI infrastructure, or HPC
- Experience with data frameworks like Ray, Apache Spark, LanceDB
- Proficiency in Python and C++
- Deep experience with Kubernetes and Ray
Responsibilities
- Optimize GPU utilization across thousands of devices for inference, training, and data processing
- Build scalable frameworks for managing heterogeneous compute jobs including data ingestion, training, and evaluation
- Develop observability and tracing tools for compute clusters to diagnose performance bottlenecks
- Collaborate with AI researchers to integrate acceleration techniques into production pipelines
- Champion cloud and container tech (Kubernetes, Ray) for elastic scaling of distributed systems
0 views 0 saves 0 applications