1d ago

Senior ML Engineer

France

✨ $180k-$230k / yearest.

full-timesenior Remotesoftware

🛠 Tech Stack

💼 About This Role

You'll lead LLM inference optimization at Kimchi, an autonomous system that matches workloads to the best model and serving config. Your work directly improves customer latency and margins. This high-autonomy role lets you set technical direction for inference optimization.

🎯 What You'll Do

Optimize throughput via continuous batching and speculative decoding
Reduce latency by profiling and fixing actual bottlenecks
Maximize KV cache efficiency with paged attention and prefix caching
Quantize models (INT8/INT4/FP8) while maintaining quality

📋 Requirements

5+ years building production ML systems for inference or training
Strong Python for production services, not scripts
Hands-on experience with vLLM, SGLang, or TensorRT-LLM
Fluency in quantization tradeoffs and quality measurement

✨ Nice to Have

Deep understanding of CUDA kernel tuning
Experience with distributed inference across multiple nodes
Familiarity with Kubernetes and cloud infrastructure

🎁 Benefits & Perks

💰 Competitive salary and equity options
🌍 Remote-first global environment
📚 Learning budget for conferences and courses
🎮 10% time for personal projects
🏖️ Extra days off for work-life balance

📨 Hiring Process

Estimated timeline: 2-4 weeks

1Recruiter screen· 30 min
2Hiring manager interview· 45 min
3Technical interview (system design)· 60 min
4Live coding· 60 min

This description was AI-summarized. View original

0 0 0