3h ago
Senior Site Reliability Engineer, Observability
Argentina
โจ $160k-$200k / yearest.
full-timesenior Remotesoftware
๐ Tech Stack
+7
๐ผ About This Role
You'll join Webflow's newly formed Observability team to improve reliability and stability of customer-facing production infrastructure serving millions of page views per hour. You'll own and evolve the observability stack including OpenTelemetry and Datadog, and continuously raise the bar on observability practices. This is a remote-first role with a mission-driven company.
๐ฏ What You'll Do
- Own and evolve Webflow's observability stack including OpenTelemetry and Datadog
- Dive into TypeScript, Node, or Go to debug and fix production behavior
- Drive adoption of SLOs, distributed tracing, and structured logging
- Build and maintain AI-powered agents for faster incident resolution
๐ Requirements
- 5+ years experience building and debugging distributed systems in customer-facing environments
- Hands-on experience with observability platforms like Datadog, Grafana, Prometheus, ElasticSearch
- Experience with OpenTelemetry or similar instrumentation frameworks
- Experience with container-centric architectures using Docker and Kubernetes
โจ Nice to Have
- Experience building or operating AI agents for observability data
- Experience with Pulumi and Kubernetes specifically
- Experience improving on-call and incident response processes
๐ Benefits & Perks
- ๐๏ธ Ownership in what you help build
- ๐ป Remote-first work environment
- ๐ Company-wide bonus program
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Screenยท 30 min
- 2Technical Interviewยท 60 min
- 3Hiring Managerยท 45 min
0 0 0