13h ago
Staff Infrastructure Site Reliability Engineer
Remote
full-timesenior Remotesoftware
๐ Tech Stack
๐ผ About This Role
Oscilar is seeking a senior SRE to own reliability across our multi-region cloud-native platform. You'll design and evolve systems that support billions of events and remain resilient through traffic spikes and dependency failures. You will have the autonomy to shape how we scale and build observability.
๐ฏ What You'll Do
- Architect resilient cloud infrastructure (AWS, Pulumi, Kubernetes).
- Lead initiatives to improve availability and latency at scale.
- Design and evolve CI/CD pipelines for speed and safety.
- Define metrics, alerts, and runbooks for observability.
- Run chaos experiments to harden the platform.
๐ Requirements
- Senior SRE or Infrastructure Engineer in high-scale environments.
- Expert-level skills in AWS and Infrastructure as Code (Pulumi, Terraform).
- Strong programming ability in Go or Python.
- Deep understanding of distributed systems (Kafka, ClickHouse) and microservices.
โจ Nice to Have
- Mastery of container orchestration (Kubernetes) and production debugging.
- Strong sense of ownership and judgment to balance velocity with reliability.
๐ Benefits & Perks
- ๐๏ธ Unlimited PTO
- ๐ป Remote-first culture
- ๐ฅ 100% Employer-covered health, dental, vision
- ๐ฐ Competitive salary and equity
- ๐ 401k plan
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Screenยท 30 min
- 2Technical Phone Interviewยท 60 min
- 3Onsite (Virtual) Interviewsยท 180 min
๐ฉ Heads Up
- Requires expert-level skills but title suggests staff level (but no explicit years given)
0 0 0