1d ago

Senior ML Engineer

Poland

โœจ $160k+ / yearest.

full-timesenior Remotesoftware

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll join the Kimchi team to optimize LLM inference performance, directly improving customer p99 latency and company margins. You'll own the technical direction of inference optimization, tuning kernels, quantization, and scheduling. This is a high-impact, high-autonomy role where your work on KV cache utilization and throughput has immediate bottom-line effects.

๐ŸŽฏ What You'll Do

  • Push throughput via batching, speculative decoding, and kernel tuning on vLLM, SGLang, and TensorRT-LLM.
  • Attack latency by profiling and fixing actual bottlenecks (compute, memory, scheduling, networking).
  • Optimize KV cache with paged attention, prefix caching, eviction policies, and quantized KV.
  • Quantize weights and activations (INT8, INT4, FP8) while measuring quality on real workloads.
  • Scale inference across nodes with distributed topologies and network-aware placement.

๐Ÿ“‹ Requirements

  • 5+ years building ML inference or training infrastructure at production scale.
  • Strong Python skills with production services experience.
  • Hands-on experience with vLLM, SGLang, or TensorRT-LLM and understanding of inference engine performance.
  • Fluency with quantization tradeoffs, including measuring quality regressions.

โœจ Nice to Have

  • Experience with distributed systems (collective communication, sharding, multi-GPU setups).
  • Bias toward measurement and instrumentation to distinguish real wins from artifacts.
  • Self-direction and excitement about a wide mandate.

๐ŸŽ Benefits & Perks

  • ๐Ÿ’ฐ Competitive salary and equity options.
  • ๐ŸŒ Flexible remote-first global environment.
  • ๐Ÿ“š Learning budget for conferences and courses.
  • ๐ŸŽ‰ Annual hackathon and team-building budget.
  • ๐Ÿ› ๏ธ Equipment budget and extra days off.

๐Ÿ“จ Hiring Process

Estimated timeline: 3-5 weeks

  1. 1Screening call with Recruiterยท 30 min
  2. 2Hiring Manager interviewยท 45 min
  3. 3Technical interview (system design)ยท 60 min
  4. 4Live codingยท 60 min
  5. 5Culture Check interview with an executiveยท 45 min

This description was AI-summarized. View original

0 0 0