2h ago

Generative AI Inference Engineer

United States
full-timesenior RemoteArtificial Intelligence

Tech Stack

Description

You will design and develop customer-facing multi-modal ML inference systems for creative generative AI applications, leveraging cutting-edge optimization techniques and high-performance computing resources to push the boundaries of what's possible.

Requirements

  • 7+ years productionizing machine learning systems including inference pipelines
  • Expert-level Python for scalable services
  • 5+ years with PyTorch and high-performance inference frameworks (e.g., Triton, TensorRT)
  • Deep understanding of diffusion model architectures
  • Experience profiling and optimizing neural networks on Nvidia GPUs (e.g., Nsight)

Responsibilities

  • Lead design and development of customer-facing multi-modal ML inference systems
  • Build inference systems for next-generation models, focusing on optimization and deployment
  • Partner with cloud providers to deliver hosted inference solutions
  • Prototype and productionize inference platform improvements and new features
  • Drive business impact through strategic thought partnership across the organization
0 views 0 saves 0 applications