1d ago
Senior ML Engineer
United Kingdom
โจ $170k-$230k / yearest.
full-timesenior Remotesoftware
๐ Tech Stack
๐ผ About This Role
You'll optimize LLM inference performance for the Kimchi platform, continuously matching customer workloads to the most cost-efficient serving configuration. Your work directly improves margins and customer p99 latency.
๐ฏ What You'll Do
- Push continuous batching and speculative decoding throughput
- Profile and optimize TTFT and TPOT latency
- Maximize KV cache utilization with paged attention and caching
- Quantize weights and KV using INT8/INT4/FP8 without quality regression
๐ Requirements
- 5+ years building ML inference or training infrastructure
- Strong Python skills for production services
- Hands-on experience with vLLM, SGLang, or TensorRT-LLM
- Fluency in quantization tradeoffs with measured quality impact
โจ Nice to Have
- Experience with distributed inference topologies
- Knowledge of GPU memory accounting and kernel tuning
๐ Benefits & Perks
- ๐ฐ Competitive salary with equity options
- ๐ Remote-first global environment
- ๐ Learning budget for conferences and courses
- โก Fast-paced workflow with features in 1-4 weeks
- ๐ฎ Annual hackathon and team-building events
๐จ Hiring Process
Estimated timeline: 2-4 weeks
- 1Screening call with Recruiterยท 30 min
- 2Hiring Manager interviewยท 45 min
- 3Technical interview (system design)ยท 60 min
- 4Live codingยท 60 min
- 5Culture Check interview with executiveยท 45 min
This description was AI-summarized. View original
0 0 0