9h ago
Senior Site Reliability Engineer
Cambridge, MA
$160k-$180k / year
full-timeseniorsoftware
๐ Tech Stack
๐ผ About This Role
You'll be the backbone of our AI-powered development platform's reliability, scalability, and operational excellence. Your core impact will be ensuring high availability and performance as we scale. This role stands out as a founding SRE team member with direct influence on architectural decisions.
๐ฏ What You'll Do
- Design, build, and operate scalable, fault-tolerant infrastructure across cloud environments.
- Define and enforce SLOs, SLAs, and error budgets; lead blameless postmortems.
- Build and maintain robust CI/CD pipelines, release automation, and deployment infrastructure.
- Own observability: design and maintain logging, metrics, tracing, and alerting stacks.
๐ Requirements
- 5+ years experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
- Strong proficiency in AWS (or GCP/Azure) and Kubernetes at scale
- Hands-on experience with infrastructure-as-code tools (Terraform, Pulumi, etc.)
- Deep expertise in observability tooling, incident management, and on-call practices
โจ Nice to Have
- Experience supporting AI/ML workloads or GPU-accelerated infrastructure
- Prior experience in a high-growth startup environment
- Familiarity with eBPF, service mesh (Istio/Linkerd), or advanced networking
๐ Benefits & Perks
- ๐ฐ Equity eligibility
- ๐ฅ Health and wellness programs
- ๐ค Sleep and recovery promotion
- ๐ Movement and restorative activities
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Screenยท 30 min
- 2Technical Interviewยท 60 min
- 3On-site Interviewยท 2 hours
This description was AI-summarized. View original
0 0 0