5h ago

Software Engineer, Inference – AMD GPU Enablement

San Francisco

$295k-$555k / year

full-timeseniorai-ml

🛠 Tech Stack

💼 About This Role

You'll scale and optimize OpenAI's inference infrastructure across emerging GPU platforms. Your core impact will be ensuring large models run smoothly on AMD hardware through low-level kernel performance and distributed execution. This high-impact role lets you shape multi-platform inference capabilities from the ground up.

🎯 What You'll Do

  • Own bring-up, correctness and performance of inference stack on AMD hardware.
  • Integrate model-serving infrastructure (e.g., vLLM, Triton) into GPU-backed systems.
  • Debug and optimize distributed inference workloads across memory, network, compute.
  • Design and optimize high-performance GPU kernels using HIP, Triton, or CUDA.

📋 Requirements

  • GPU kernel experience with HIP, CUDA, or Triton.
  • Familiarity with communication libraries like NCCL/RCCL.
  • Experience with distributed inference systems and scaling across GPUs.
  • Proven ability to solve end-to-end performance challenges across hardware and software layers.

✨ Nice to Have

  • Contributions to open-source libraries like RCCL, Triton, or vLLM.
  • Experience with GPU performance tools (Nsight, rocprof, perf).
  • Experience deploying inference on non-NVIDIA GPU environments.

🎁 Benefits & Perks

  • 💰 Competitive salary ($295K–$555K plus equity)
  • 🏖️ Flexible PTO
  • 💻 Remote-friendly (optional)
  • 📈 Equity grants
  • 🏥 Health insurance
0 0 0