11h ago

Member of Technical Staff - Training Platform

San Francisco, CA

$150k-$300k / year

full-timemidai-ml Visa Sponsor

๐Ÿ›  Tech Stack

+3

๐Ÿ’ผ About This Role

You'll build the hosted training platform that lets users launch fine-tuning runs on managed GPU clusters. You'll design the developer-facing platform and the Kubernetes-based training infrastructure. This role spans full-stack product development and distributed systems at frontier AI scale.

๐ŸŽฏ What You'll Do

  • Design and operate Kubernetes-based training orchestration across multi-cloud GPU fleets
  • Build FastAPI backend services and REST APIs for job submission and monitoring
  • Ship product UI in Next.js / React / TypeScript with live run monitoring
  • Develop Python control-plane agents that watch pods and keep clusters in sync

๐Ÿ“‹ Requirements

  • 3-5 years of experience in platform or infrastructure engineering
  • Strong Kubernetes operations experience with Helm, CRDs, operators
  • Strong Python backend development with FastAPI
  • Working knowledge of distributed training and GPU hardware

โœจ Nice to Have

  • Experience with vLLM, SGLang, or TensorRT-LLM
  • Familiarity with GCP (GKE, GCS) and infrastructure automation
  • Frontend experience with React/Next.js and TypeScript

๐ŸŽ Benefits & Perks

  • ๐Ÿ’ฐ Competitive salary ($150Kโ€“$300K) with significant equity
  • ๐Ÿก Flexible work (remote or San Francisco office)
  • ๐Ÿ›‚ Full visa sponsorship and relocation support
  • ๐Ÿ“š Professional development budget for courses and conferences
  • ๐ŸŒ Regular team off-sites and conference attendance

๐Ÿ“จ Hiring Process

Estimated timeline: 1-3 weeks ยท AI estimate

  1. 1Recruiter Screenยท 30 min
  2. 2Technical Interviewยท 60 min
  3. 3Final Roundยท 60 min
0 0 0