6h ago
Member of Technical Staff (AI Inference Engineer)
London
✨ $150k-$250k / yearest.
full-timemidai-ml
🛠 Tech Stack
💼 About This Role
You'll join the team that builds and runs the inference engine behind every Perplexity query, deploying dozens of model architectures at scale with tight latency and cost budgets. You'll own problems end-to-end, from reading a research paper to writing GPU kernels and debugging production incidents.
🎯 What You'll Do
- Support transformer-based retrieval, text-generation, and multimodal models in inference infrastructure.
- Port in-house CUDA kernels to NVIDIA's CuTe DSL for portability.
- Develop Rust-based inference server to handle growing traffic.
- Profile and fix bottlenecks from network ingress through GPU kernel interleaving.
📋 Requirements
- 3+ years professional software engineering with ML inference or high-performance systems.
- GPU programming experience with CUDA, Triton, or CUTLASS.
- Familiarity with deep learning framework (PyTorch, JAX, TensorFlow).
- Understanding of LLM architectures and inference optimization techniques.
✨ Nice to Have
- ML compilers and framework internals (PyTorch internals, torch.compile).
- Distributed GPU communication (NCCL, NVLink, InfiniBand, RDMA).
- Low-precision inference (INT8/FP8/FP4 quantization).
🎁 Benefits & Perks
- 💰 Competitive base salary with equity.
- 🏖️ Flexible work environment (hybrid/remote possible).
- 📈 Growth opportunities at a fast-moving AI startup.
- 🔬 Cutting-edge tech stack (Rust, CUDA, CuTe DSL).
0 0 0