2h ago
Generative AI Inference Engineer
United States
full-timesenior RemoteArtificial Intelligence
Tech Stack
Description
You will design and develop customer-facing multi-modal ML inference systems for creative generative AI applications, leveraging cutting-edge optimization techniques and high-performance computing resources to push the boundaries of what's possible.
Requirements
- 7+ years productionizing machine learning systems including inference pipelines
- Expert-level Python for scalable services
- 5+ years with PyTorch and high-performance inference frameworks (e.g., Triton, TensorRT)
- Deep understanding of diffusion model architectures
- Experience profiling and optimizing neural networks on Nvidia GPUs (e.g., Nsight)
Responsibilities
- Lead design and development of customer-facing multi-modal ML inference systems
- Build inference systems for next-generation models, focusing on optimization and deployment
- Partner with cloud providers to deliver hosted inference solutions
- Prototype and productionize inference platform improvements and new features
- Drive business impact through strategic thought partnership across the organization
0 views 0 saves 0 applications