2h ago

Member of Technical Staff – Model Training

Palo Alto, CA

$175,000-$350,000 / year

full-timeseniorArtificial Intelligence Visa Sponsor

Tech Stack

Description

You will design, build, and scale post-training pipelines for large language models, focusing on fine-tuning and preference optimization techniques like RLHF and DPO. Your work will directly improve model reliability, alignment, and cost while collaborating with cross-functional teams to land improvements in production systems.

Requirements

  • Hands-on experience training and fine-tuning large transformer models on multi-GPU/multi-node clusters.
  • Fluent in PyTorch and ecosystem tools (Torchtune, FSDP, DeepSpeed) with knowledge of distributed-training internals, mixed precision, and memory-efficiency tricks.
  • Shipped or published work in RLHF, DPO, GRPO, or RLAIF and understands practical trade-offs.
  • Bachelor's degree or equivalent in a related field.
  • Communicates crisply with both technical and non-technical teammates.

Responsibilities

  • Contribute to end-to-end post-training workflows including dataset curation, hyper-parameter search, evaluation, and rollout using PyTorch, Torchtune, FSDP/DeepSpeed.
  • Prototype and compare alignment techniques (e.g., curriculum RL, multi-objective reward modeling, tool-use fine-tuning) and push best ideas into production.
  • Automate training at scale by building robust pipeline components, tools, scripts, and dashboards for reproducible experiments.
  • Define metrics, run A/B tests, and iterate quickly to meet aggressive quality targets.
  • Collaborate with inference, safety, and product teams to land improvements in customer-facing systems.
0 views 0 saves 0 applications