9h ago
Member of Technical Staff (Inference)
Paris
✨ $150k-$250k / yearest.
full-time Hybridai-ml
🛠 Tech Stack
💼 About This Role
You'll develop and optimize inference pipelines for H's agentic AI models, focusing on high throughput and low latency. You'll collaborate with research teams to enhance model efficiency using advanced techniques like quantization and distributed computing.
🎯 What You'll Do
- Develop scalable, low-latency inference pipelines
- Optimize model performance using quantization and caching
- Develop specialized GPU kernels for attention and matmul
- Implement state-of-the-art inference techniques
📋 Requirements
- MS or PhD in Computer Science or related field
- Proficient in Python, Rust, or C/C++
- Experience in GPU programming (CUDA, Triton, Metal)
- Experience in model compression and quantization
✨ Nice to Have
- Experience with LLM serving frameworks (vLLM, TensorRT-LLM)
- Experience with CUDA kernel programming and NCCL
- Experience with deep learning inference frameworks (PyTorch, ONNX Runtime)
🎁 Benefits & Perks
- 🚀 Join a top AI startup in early days
- 🌍 Collaborate with world-class AI talent
- 📈 Professional growth opportunities
0 0 0