1d ago
Senior ML Engineer
Spain
โจ $150k-$220k / yearest.
full-timesenior Remotesoftware
๐ Tech Stack
๐ผ About This Role
You'll join the Kimchi team to optimize LLM inference performance, focusing on throughput, latency, and KV cache efficiency for a growing customer base. You'll lead the technical direction of inference optimization with high autonomy. This role directly impacts customer p99 latency and company margins.
๐ฏ What You'll Do
- Push throughput via continuous batching and kernel-level tuning
- Cut latency by profiling and fixing bottlenecks
- Optimize KV cache utilization with paged attention and prefix caching
- Quantize weights and activations without quality regression
๐ Requirements
- 5+ years building production ML systems
- Strong Python skills with production services experience
- Hands-on experience with vLLM, SGLang, or TensorRT-LLM
- Fluency in quantization tradeoffs and measurement
โจ Nice to Have
- Distributed systems experience with collective communication
- Knowledge of multi-GPU and multi-node inference
- Self-direction in a wide mandate role
๐ Benefits & Perks
- ๐ฐ Competitive salary plus equity options
- ๐๏ธ Flexible remote-first global environment
- ๐ Learning budget for conferences and courses
- ๐ป Equipment budget for your home office
- ๐๏ธ Extra days off for work-life balance
๐จ Hiring Process
Estimated timeline: 3-5 weeks
- 1Screening call with Recruiterยท 30 min
- 2Hiring Manager interviewยท 45 min
- 3Technical interview (system design)ยท 60 min
- 4Live codingยท 60 min
- 5Culture Check interview with executiveยท 45 min
This description was AI-summarized. View original
0 0 0