16h ago
Machine Learning Infrastructure Engineer
Redwood City, CA
$150k-$350k / year
full-timeseniorai-ml
🛠 Tech Stack
💼 About This Role
You'll design, build, and maintain training and serving infrastructure for ML research at a fast-growing AI company. You'll maximize GPU utilization and build tooling to diagnose cluster issues. Over 20 million users interact with our characters monthly.
🎯 What You'll Do
- Provide infrastructure support to ML research and product
- Build tooling to diagnose cluster issues and hardware failures
- Monitor deployments, manage experiments, and support research
- Maximize GPU allocation and utilization for serving and training
📋 Requirements
- 4+ years experience supporting ML infrastructure
- Experience developing tools to diagnose ML infrastructure problems
- Experience with cloud platforms like Compute Engine, Kubernetes, Cloud Storage
- Experience working with GPUs
✨ Nice to Have
- Experience with large GPU clusters and HPC/networking
- Experience supporting large language model training
- Experience with ML frameworks like PyTorch/TensorFlow/JAX
0 0 0