9h ago

Senior Site Reliability Engineer

Cambridge, MA

$160k-$180k / year

full-timeseniorsoftware

๐Ÿ›  Tech Stack

๐Ÿ’ผ About This Role

You'll be the backbone of our AI-powered development platform's reliability, scalability, and operational excellence. Your core impact will be ensuring high availability and performance as we scale. This role stands out as a founding SRE team member with direct influence on architectural decisions.

๐ŸŽฏ What You'll Do

  • Design, build, and operate scalable, fault-tolerant infrastructure across cloud environments.
  • Define and enforce SLOs, SLAs, and error budgets; lead blameless postmortems.
  • Build and maintain robust CI/CD pipelines, release automation, and deployment infrastructure.
  • Own observability: design and maintain logging, metrics, tracing, and alerting stacks.

๐Ÿ“‹ Requirements

  • 5+ years experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
  • Strong proficiency in AWS (or GCP/Azure) and Kubernetes at scale
  • Hands-on experience with infrastructure-as-code tools (Terraform, Pulumi, etc.)
  • Deep expertise in observability tooling, incident management, and on-call practices

โœจ Nice to Have

  • Experience supporting AI/ML workloads or GPU-accelerated infrastructure
  • Prior experience in a high-growth startup environment
  • Familiarity with eBPF, service mesh (Istio/Linkerd), or advanced networking

๐ŸŽ Benefits & Perks

  • ๐Ÿ’ฐ Equity eligibility
  • ๐Ÿฅ Health and wellness programs
  • ๐Ÿ’ค Sleep and recovery promotion
  • ๐Ÿƒ Movement and restorative activities

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks ยท AI estimate

  1. 1Recruiter Screenยท 30 min
  2. 2Technical Interviewยท 60 min
  3. 3On-site Interviewยท 2 hours

This description was AI-summarized. View original

0 0 0