1d ago

Senior ML Engineer

Spain

โœจ $150k-$220k / yearest.

full-timesenior Remotesoftware

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll join the Kimchi team to optimize LLM inference performance, focusing on throughput, latency, and KV cache efficiency for a growing customer base. You'll lead the technical direction of inference optimization with high autonomy. This role directly impacts customer p99 latency and company margins.

๐ŸŽฏ What You'll Do

  • Push throughput via continuous batching and kernel-level tuning
  • Cut latency by profiling and fixing bottlenecks
  • Optimize KV cache utilization with paged attention and prefix caching
  • Quantize weights and activations without quality regression

๐Ÿ“‹ Requirements

  • 5+ years building production ML systems
  • Strong Python skills with production services experience
  • Hands-on experience with vLLM, SGLang, or TensorRT-LLM
  • Fluency in quantization tradeoffs and measurement

โœจ Nice to Have

  • Distributed systems experience with collective communication
  • Knowledge of multi-GPU and multi-node inference
  • Self-direction in a wide mandate role

๐ŸŽ Benefits & Perks

  • ๐Ÿ’ฐ Competitive salary plus equity options
  • ๐Ÿ–๏ธ Flexible remote-first global environment
  • ๐Ÿ“š Learning budget for conferences and courses
  • ๐Ÿ’ป Equipment budget for your home office
  • ๐Ÿ—“๏ธ Extra days off for work-life balance

๐Ÿ“จ Hiring Process

Estimated timeline: 3-5 weeks

  1. 1Screening call with Recruiterยท 30 min
  2. 2Hiring Manager interviewยท 45 min
  3. 3Technical interview (system design)ยท 60 min
  4. 4Live codingยท 60 min
  5. 5Culture Check interview with executiveยท 45 min

This description was AI-summarized. View original

0 0 0