9h ago

Senior Site Reliability Engineer

Cambridge, MA

$160k-$180k / year

full-timeseniorsoftware

🛠 Tech Stack

💼 About This Role

You'll be the backbone of our AI-powered development platform's reliability, scalability, and operational excellence. Your core impact will be ensuring high availability and performance as we scale. This role stands out as a founding SRE team member with direct influence on architectural decisions.

🎯 What You'll Do

Design, build, and operate scalable, fault-tolerant infrastructure across cloud environments.
Define and enforce SLOs, SLAs, and error budgets; lead blameless postmortems.
Build and maintain robust CI/CD pipelines, release automation, and deployment infrastructure.
Own observability: design and maintain logging, metrics, tracing, and alerting stacks.

📋 Requirements

5+ years experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
Strong proficiency in AWS (or GCP/Azure) and Kubernetes at scale
Hands-on experience with infrastructure-as-code tools (Terraform, Pulumi, etc.)
Deep expertise in observability tooling, incident management, and on-call practices

✨ Nice to Have

Experience supporting AI/ML workloads or GPU-accelerated infrastructure
Prior experience in a high-growth startup environment
Familiarity with eBPF, service mesh (Istio/Linkerd), or advanced networking

🎁 Benefits & Perks

💰 Equity eligibility
🏥 Health and wellness programs
💤 Sleep and recovery promotion
🏃 Movement and restorative activities

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

1Recruiter Screen· 30 min
2Technical Interview· 60 min
3On-site Interview· 2 hours

This description was AI-summarized. View original

0 0 0