2h ago
Senior / Staff AI Research Engineer, Real-Time Inference
Milpitas, CA
full-timeseniorrobotics
Tech Stack
Description
In this role, you will drive the full stack of model optimization—from CUDA kernel engineering to quantization and compression—to deploy high-performance AI models on edge compute platforms powering RoboForce robots in the field.
Requirements
- Master's degree in CS, EE, or related field with 4+ years experience, or PhD.
- Deep expertise in CUDA, GPU architecture, and low-level kernel optimization.
- Hands-on experience with TensorRT, ONNX Runtime, TVM, or Triton for model quantization and deployment.
- Proficiency in C++ and Python with strong systems programming skills.
- Experience deploying ML models on edge/embedded hardware (e.g., NVIDIA Jetson, Orin).
Responsibilities
- Develop and optimize real-time inference pipelines for embodied AI models on edge hardware (e.g., NVIDIA Jetson).
- Implement CUDA-level custom kernels, memory layout tuning, and hardware-aware graph compilation to minimize latency.
- Apply model compression techniques including quantization, pruning, distillation, and structured sparsity.
- Profile and debug inference stacks using tools like NSight, TensorRT, and Triton to eliminate performance bottlenecks.
- Collaborate with ML research and robotics teams to co-design architectures meeting real-time control-loop requirements.
0 views 0 saves 0 applications