Research Engineer (LLM Training and Performance) at Jobs at JetBrains

1h ago

Research Engineer (LLM Training and Performance)

Amsterdam, Netherlands; Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; London, United Kingdom; Madrid, Spain; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Warsaw, Poland; Yerevan, Armenia

full-timeseniorSoftware Development

Tech Stack

Description

You will own the training stack and model architecture for our Mellum LLM family, making training faster, cheaper, and more stable at scale. You'll profile, design, and implement changes to the training pipeline, from architecture to custom GPU kernels.

Requirements

Strong PyTorch and PyTorch Distributed experience running multi-node jobs with tens to hundreds of GPUs
Hands-on experience with Megatron-LM/Megatron-Core/NeMo, DeepSpeed, or FSDP/ZeRO
Real profiling expertise (Nsight Systems/Compute, nvprof) and NVTX-instrumented workflows
GPU programming skills with Triton and/or CUDA, able to write, test, and debug kernels
Solid understanding of NCCL collectives, topology, fabric effects (IB/RoCE)

Responsibilities

Improve end-to-end performance for multi-node LLM pre-training and post-training pipelines
Profile and fix hotspots using compute/comm overlap, kernel fusion, scheduling
Design and evaluate architecture choices (depth/width, attention variants, MoE routing)
Implement custom ops (Triton/CUDA C++) and integrate via PyTorch extensions
Push memory/perf levers: FSDP/ZeRO, activation checkpointing, FP8/TE, parallelism, NCCL tuning

Jobs at JetBrains

Other jobs at Jobs at JetBrains

No other jobs found.

0 views 0 saves 0 applications