1d ago

Senior ML Engineer

France

โœจ $180k-$230k / yearest.

full-timesenior Remotesoftware

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll lead LLM inference optimization at Kimchi, an autonomous system that matches workloads to the best model and serving config. Your work directly improves customer latency and margins. This high-autonomy role lets you set technical direction for inference optimization.

๐ŸŽฏ What You'll Do

  • Optimize throughput via continuous batching and speculative decoding
  • Reduce latency by profiling and fixing actual bottlenecks
  • Maximize KV cache efficiency with paged attention and prefix caching
  • Quantize models (INT8/INT4/FP8) while maintaining quality

๐Ÿ“‹ Requirements

  • 5+ years building production ML systems for inference or training
  • Strong Python for production services, not scripts
  • Hands-on experience with vLLM, SGLang, or TensorRT-LLM
  • Fluency in quantization tradeoffs and quality measurement

โœจ Nice to Have

  • Deep understanding of CUDA kernel tuning
  • Experience with distributed inference across multiple nodes
  • Familiarity with Kubernetes and cloud infrastructure

๐ŸŽ Benefits & Perks

  • ๐Ÿ’ฐ Competitive salary and equity options
  • ๐ŸŒ Remote-first global environment
  • ๐Ÿ“š Learning budget for conferences and courses
  • ๐ŸŽฎ 10% time for personal projects
  • ๐Ÿ–๏ธ Extra days off for work-life balance

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks

  1. 1Recruiter screenยท 30 min
  2. 2Hiring manager interviewยท 45 min
  3. 3Technical interview (system design)ยท 60 min
  4. 4Live codingยท 60 min

This description was AI-summarized. View original

0 0 0