2h ago
Research Engineer, Infrastructure, Numerics
San Francisco
$350k-$475k / year
full-timeArtificial Intelligence Visa Sponsor
🛠 Tech Stack
💼 About This Role
You'll design and build core systems for efficient large-scale model training, focusing on numerics and distributed infrastructure. You'll improve numerical foundations of our training stack, from precision formats to communication frameworks. This role sits at the intersection of research and systems engineering.
🎯 What You'll Do
- Design and optimize distributed training infrastructure for large-scale LLMs.
- Implement and evaluate low-precision numerics (BF16, MXFP8, NVFP4).
- Develop kernels and communication primitives for mixed-precision arithmetic.
- Collaborate with research teams to co-design model architectures and training recipes.
📋 Requirements
- Bachelor's degree in computer science, electrical engineering, or related field.
- Understanding of deep learning frameworks like PyTorch or JAX.
- Strong engineering skills in floating-point numerics and distributed systems.
- Experience with low-precision arithmetic and complex codebase debugging.
✨ Nice to Have
- Familiarity with distributed frameworks like DeepSpeed or Megatron-LM.
- Experience implementing FP8, INT8, or block-floating point formats.
- Prior contributions to open-source deep learning infrastructure.
🎁 Benefits & Perks
- 🏥 Generous health, dental, and vision benefits
- 🏖️ Unlimited PTO
- 👶 Paid parental leave
- 🚚 Relocation support
0 0 0