18h ago

Staff Site Reliability Engineer

New York City, NY

$157.7k-$277.8k / year

full-timesenior Hybridai-ml

๐Ÿ›  Tech Stack

+2

๐Ÿ’ผ About This Role

You'll build resilient systems and automate infrastructure for an enterprise AI platform handling high-traffic environments. Your work will directly ensure 24/7 availability for customers like Marriott and Uber. This role offers the chance to shape reliability practices at a rapidly growing unicorn.

๐ŸŽฏ What You'll Do

  • Build AI-native approaches for infrastructure management using Python or Go
  • Design scalable, fault-tolerant infrastructure on AWS, GCP, or Azure
  • Own reliability, performance, and efficiency of core services with SLOs
  • Lead incident response, post-mortems, and root cause analyses

๐Ÿ“‹ Requirements

  • 7+ years experience in SRE or similar role
  • Deep expertise with AWS, Docker, Kubernetes, and Terraform
  • Strong proficiency in Python, Java, or Go for automation
  • Knowledge of Prometheus, Grafana, or ELK Stack

โœจ Nice to Have

  • Experience with AI/ML platforms
  • Experience with high-traffic production systems

๐ŸŽ Benefits & Perks

  • ๐Ÿ–๏ธ Generous PTO plus company holidays
  • ๐Ÿฅ Medical, dental, and vision coverage for you and family
  • ๐Ÿ‘ถ 12 weeks paid parental leave for all parents
  • ๐Ÿ’ช Wellness stipend for gym, massage, etc.
  • ๐Ÿ“š Learning and development stipend

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks ยท AI estimate

  1. 1Phone Screenยท 30 min
  2. 2Technical Interviewยท 60 min
  3. 3System Design Interviewยท 60 min
0 0 0