1d ago

Senior ML Engineer

United Kingdom

✨ $170k-$230k / yearest.

full-timesenior Remotesoftware

🛠 Tech Stack

💼 About This Role

You'll optimize LLM inference performance for the Kimchi platform, continuously matching customer workloads to the most cost-efficient serving configuration. Your work directly improves margins and customer p99 latency.

🎯 What You'll Do

Push continuous batching and speculative decoding throughput
Profile and optimize TTFT and TPOT latency
Maximize KV cache utilization with paged attention and caching
Quantize weights and KV using INT8/INT4/FP8 without quality regression

📋 Requirements

5+ years building ML inference or training infrastructure
Strong Python skills for production services
Hands-on experience with vLLM, SGLang, or TensorRT-LLM
Fluency in quantization tradeoffs with measured quality impact

✨ Nice to Have

Experience with distributed inference topologies
Knowledge of GPU memory accounting and kernel tuning

🎁 Benefits & Perks

💰 Competitive salary with equity options
🌍 Remote-first global environment
📚 Learning budget for conferences and courses
⚡ Fast-paced workflow with features in 1-4 weeks
🎮 Annual hackathon and team-building events

📨 Hiring Process

Estimated timeline: 2-4 weeks

1Screening call with Recruiter· 30 min
2Hiring Manager interview· 45 min
3Technical interview (system design)· 60 min
4Live coding· 60 min
5Culture Check interview with executive· 45 min

This description was AI-summarized. View original

0 0 0