2h ago

Senior / Staff AI Research Engineer, Real-Time Inference

Milpitas, CA
full-timeseniorrobotics

Tech Stack

Description

In this role, you will drive the full stack of model optimization—from CUDA kernel engineering to quantization and compression—to deploy high-performance AI models on edge compute platforms powering RoboForce robots in the field.

Requirements

  • Master's degree in CS, EE, or related field with 4+ years experience, or PhD.
  • Deep expertise in CUDA, GPU architecture, and low-level kernel optimization.
  • Hands-on experience with TensorRT, ONNX Runtime, TVM, or Triton for model quantization and deployment.
  • Proficiency in C++ and Python with strong systems programming skills.
  • Experience deploying ML models on edge/embedded hardware (e.g., NVIDIA Jetson, Orin).

Responsibilities

  • Develop and optimize real-time inference pipelines for embodied AI models on edge hardware (e.g., NVIDIA Jetson).
  • Implement CUDA-level custom kernels, memory layout tuning, and hardware-aware graph compilation to minimize latency.
  • Apply model compression techniques including quantization, pruning, distillation, and structured sparsity.
  • Profile and debug inference stacks using tools like NSight, TensorRT, and Triton to eliminate performance bottlenecks.
  • Collaborate with ML research and robotics teams to co-design architectures meeting real-time control-loop requirements.
0 views 0 saves 0 applications