5h ago
Software Engineer, Inference – AMD GPU Enablement
San Francisco
$295k-$555k / year
full-timeseniorai-ml
🛠 Tech Stack
💼 About This Role
You'll scale and optimize OpenAI's inference infrastructure across emerging GPU platforms. Your core impact will be ensuring large models run smoothly on AMD hardware through low-level kernel performance and distributed execution. This high-impact role lets you shape multi-platform inference capabilities from the ground up.
🎯 What You'll Do
- Own bring-up, correctness and performance of inference stack on AMD hardware.
- Integrate model-serving infrastructure (e.g., vLLM, Triton) into GPU-backed systems.
- Debug and optimize distributed inference workloads across memory, network, compute.
- Design and optimize high-performance GPU kernels using HIP, Triton, or CUDA.
📋 Requirements
- GPU kernel experience with HIP, CUDA, or Triton.
- Familiarity with communication libraries like NCCL/RCCL.
- Experience with distributed inference systems and scaling across GPUs.
- Proven ability to solve end-to-end performance challenges across hardware and software layers.
✨ Nice to Have
- Contributions to open-source libraries like RCCL, Triton, or vLLM.
- Experience with GPU performance tools (Nsight, rocprof, perf).
- Experience deploying inference on non-NVIDIA GPU environments.
🎁 Benefits & Perks
- 💰 Competitive salary ($295K–$555K plus equity)
- 🏖️ Flexible PTO
- 💻 Remote-friendly (optional)
- 📈 Equity grants
- 🏥 Health insurance
0 0 0