1d ago

Senior ML Engineer

Spain

✨ $150k-$220k / yearest.

full-timesenior Remotesoftware

🛠 Tech Stack

💼 About This Role

You'll join the Kimchi team to optimize LLM inference performance, focusing on throughput, latency, and KV cache efficiency for a growing customer base. You'll lead the technical direction of inference optimization with high autonomy. This role directly impacts customer p99 latency and company margins.

🎯 What You'll Do

Push throughput via continuous batching and kernel-level tuning
Cut latency by profiling and fixing bottlenecks
Optimize KV cache utilization with paged attention and prefix caching
Quantize weights and activations without quality regression

📋 Requirements

5+ years building production ML systems
Strong Python skills with production services experience
Hands-on experience with vLLM, SGLang, or TensorRT-LLM
Fluency in quantization tradeoffs and measurement

✨ Nice to Have

Distributed systems experience with collective communication
Knowledge of multi-GPU and multi-node inference
Self-direction in a wide mandate role

🎁 Benefits & Perks

💰 Competitive salary plus equity options
🏖️ Flexible remote-first global environment
📚 Learning budget for conferences and courses
💻 Equipment budget for your home office
🗓️ Extra days off for work-life balance

📨 Hiring Process

Estimated timeline: 3-5 weeks

1Screening call with Recruiter· 30 min
2Hiring Manager interview· 45 min
3Technical interview (system design)· 60 min
4Live coding· 60 min
5Culture Check interview with executive· 45 min

This description was AI-summarized. View original

0 0 0