1d ago

Senior ML Engineer

Poland

✨ $160k+ / yearest.

full-timesenior Remotesoftware

🛠 Tech Stack

💼 About This Role

You'll join the Kimchi team to optimize LLM inference performance, directly improving customer p99 latency and company margins. You'll own the technical direction of inference optimization, tuning kernels, quantization, and scheduling. This is a high-impact, high-autonomy role where your work on KV cache utilization and throughput has immediate bottom-line effects.

🎯 What You'll Do

Push throughput via batching, speculative decoding, and kernel tuning on vLLM, SGLang, and TensorRT-LLM.
Attack latency by profiling and fixing actual bottlenecks (compute, memory, scheduling, networking).
Optimize KV cache with paged attention, prefix caching, eviction policies, and quantized KV.
Quantize weights and activations (INT8, INT4, FP8) while measuring quality on real workloads.
Scale inference across nodes with distributed topologies and network-aware placement.

📋 Requirements

5+ years building ML inference or training infrastructure at production scale.
Strong Python skills with production services experience.
Hands-on experience with vLLM, SGLang, or TensorRT-LLM and understanding of inference engine performance.
Fluency with quantization tradeoffs, including measuring quality regressions.

✨ Nice to Have

Experience with distributed systems (collective communication, sharding, multi-GPU setups).
Bias toward measurement and instrumentation to distinguish real wins from artifacts.
Self-direction and excitement about a wide mandate.

🎁 Benefits & Perks

💰 Competitive salary and equity options.
🌍 Flexible remote-first global environment.
📚 Learning budget for conferences and courses.
🎉 Annual hackathon and team-building budget.
🛠️ Equipment budget and extra days off.

📨 Hiring Process

Estimated timeline: 3-5 weeks

1Screening call with Recruiter· 30 min
2Hiring Manager interview· 45 min
3Technical interview (system design)· 60 min
4Live coding· 60 min
5Culture Check interview with an executive· 45 min

This description was AI-summarized. View original

0 0 0