1d ago
Senior ML Engineer
France
โจ $180k-$230k / yearest.
full-timesenior Remotesoftware
๐ Tech Stack
๐ผ About This Role
You'll lead LLM inference optimization at Kimchi, an autonomous system that matches workloads to the best model and serving config. Your work directly improves customer latency and margins. This high-autonomy role lets you set technical direction for inference optimization.
๐ฏ What You'll Do
- Optimize throughput via continuous batching and speculative decoding
- Reduce latency by profiling and fixing actual bottlenecks
- Maximize KV cache efficiency with paged attention and prefix caching
- Quantize models (INT8/INT4/FP8) while maintaining quality
๐ Requirements
- 5+ years building production ML systems for inference or training
- Strong Python for production services, not scripts
- Hands-on experience with vLLM, SGLang, or TensorRT-LLM
- Fluency in quantization tradeoffs and quality measurement
โจ Nice to Have
- Deep understanding of CUDA kernel tuning
- Experience with distributed inference across multiple nodes
- Familiarity with Kubernetes and cloud infrastructure
๐ Benefits & Perks
- ๐ฐ Competitive salary and equity options
- ๐ Remote-first global environment
- ๐ Learning budget for conferences and courses
- ๐ฎ 10% time for personal projects
- ๐๏ธ Extra days off for work-life balance
๐จ Hiring Process
Estimated timeline: 2-4 weeks
- 1Recruiter screenยท 30 min
- 2Hiring manager interviewยท 45 min
- 3Technical interview (system design)ยท 60 min
- 4Live codingยท 60 min
This description was AI-summarized. View original
0 0 0