1d ago

Senior ML Engineer

United Kingdom

โœจ $170k-$230k / yearest.

full-timesenior Remotesoftware

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll optimize LLM inference performance for the Kimchi platform, continuously matching customer workloads to the most cost-efficient serving configuration. Your work directly improves margins and customer p99 latency.

๐ŸŽฏ What You'll Do

  • Push continuous batching and speculative decoding throughput
  • Profile and optimize TTFT and TPOT latency
  • Maximize KV cache utilization with paged attention and caching
  • Quantize weights and KV using INT8/INT4/FP8 without quality regression

๐Ÿ“‹ Requirements

  • 5+ years building ML inference or training infrastructure
  • Strong Python skills for production services
  • Hands-on experience with vLLM, SGLang, or TensorRT-LLM
  • Fluency in quantization tradeoffs with measured quality impact

โœจ Nice to Have

  • Experience with distributed inference topologies
  • Knowledge of GPU memory accounting and kernel tuning

๐ŸŽ Benefits & Perks

  • ๐Ÿ’ฐ Competitive salary with equity options
  • ๐ŸŒ Remote-first global environment
  • ๐Ÿ“š Learning budget for conferences and courses
  • โšก Fast-paced workflow with features in 1-4 weeks
  • ๐ŸŽฎ Annual hackathon and team-building events

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks

  1. 1Screening call with Recruiterยท 30 min
  2. 2Hiring Manager interviewยท 45 min
  3. 3Technical interview (system design)ยท 60 min
  4. 4Live codingยท 60 min
  5. 5Culture Check interview with executiveยท 45 min

This description was AI-summarized. View original

0 0 0