3h ago

Senior Site Reliability Engineer, Observability

Argentina

โœจ $160k-$200k / yearest.

full-timesenior Remotesoftware

๐Ÿ›  Tech Stack

+7

๐Ÿ’ผ About This Role

You'll join Webflow's newly formed Observability team to improve reliability and stability of customer-facing production infrastructure serving millions of page views per hour. You'll own and evolve the observability stack including OpenTelemetry and Datadog, and continuously raise the bar on observability practices. This is a remote-first role with a mission-driven company.

๐ŸŽฏ What You'll Do

  • Own and evolve Webflow's observability stack including OpenTelemetry and Datadog
  • Dive into TypeScript, Node, or Go to debug and fix production behavior
  • Drive adoption of SLOs, distributed tracing, and structured logging
  • Build and maintain AI-powered agents for faster incident resolution

๐Ÿ“‹ Requirements

  • 5+ years experience building and debugging distributed systems in customer-facing environments
  • Hands-on experience with observability platforms like Datadog, Grafana, Prometheus, ElasticSearch
  • Experience with OpenTelemetry or similar instrumentation frameworks
  • Experience with container-centric architectures using Docker and Kubernetes

โœจ Nice to Have

  • Experience building or operating AI agents for observability data
  • Experience with Pulumi and Kubernetes specifically
  • Experience improving on-call and incident response processes

๐ŸŽ Benefits & Perks

  • ๐Ÿ–๏ธ Ownership in what you help build
  • ๐Ÿ’ป Remote-first work environment
  • ๐Ÿ“ˆ Company-wide bonus program

๐Ÿ“จ Hiring Process

Estimated timeline: 2-4 weeks ยท AI estimate

  1. 1Recruiter Screenยท 30 min
  2. 2Technical Interviewยท 60 min
  3. 3Hiring Managerยท 45 min
0 0 0