2h ago
AI Research Engineer Intern (PhD), Real-Time Inference for Embodied AI
Milpitas, CA
internshipinternrobotics
Tech Stack
Description
You will research and develop techniques to enable real-time inference for embodied AI models deployed on robotic platforms, optimizing performance for models such as Vision-Language-Action (VLA), world models, and multimodal transformer-based policies. You'll collaborate with robotics, infrastructure, and hardware teams to integrate optimized models into real robot stacks and edge systems, exploring tradeoffs between model quality and runtime efficiency for practical deployment.
Requirements
- Currently pursuing or recently completed a PhD in CS, EE, Robotics, ML, Systems, or related field
- Strong background in ML systems, model inference optimization, or efficient deep learning
- Experience optimizing modern ML models for production or low-latency deployment
- Hands-on experience with real-time inference systems, efficient transformer inference, model compression, pruning, quantization, distillation, GPU performance optimization, or deployment frameworks (TensorRT, ONNX Runtime, XLA, TVM, Triton, etc.)
- Proficiency with deep learning frameworks such as PyTorch, JAX, or TensorFlow and strong programming/systems skills including performance profiling and debugging
Responsibilities
- Research and develop techniques for real-time inference of embodied AI models on robotic platforms
- Optimize inference performance for VLA models, world models, multimodal transformer-based policies, and perception models
- Improve model latency, throughput, memory efficiency, and reliability via compression, quantization, distillation, batching, scheduling optimization, KV-cache/decoding optimization, graph compilation, and kernel-level acceleration
- Collaborate with robotics, infrastructure, and hardware teams to integrate optimized models into real robot stacks and edge/on-device systems
- Design benchmarking pipelines for evaluating end-to-end performance including control frequency, action latency, and system robustness
0 views 0 saves 0 applications