3h ago

Staff+ Software Engineer, Observability

London, UK

$325,000-$390,000 / year

full-timeseniorartificial intelligence

Tech Stack

Description

You will design and build scalable observability infrastructure for Anthropic's AI systems, including telemetry pipelines, alerting, and diagnostic tools. Your work will directly impact the reliability of research and production systems across massive GPU/TPU clusters.

Requirements

  • 10+ years building and operating large-scale observability infrastructure
  • Deep experience in at least one observability signal area
  • Understanding of high-throughput data pipelines and columnar storage
  • Experience with Prometheus, Grafana, ClickHouse, OpenTelemetry, or similar
  • Proficiency in Python, Rust, or Go

Responsibilities

  • Design and build scalable telemetry ingest and storage pipelines for metrics, logs, traces, and errors
  • Own and evolve core observability platforms, driving migrations and architectural improvements
  • Build instrumentation libraries, SDKs, and integrations for high-quality telemetry
  • Drive alerting and SLO infrastructure with minimal noise
  • Build cross-signal correlation, unified query interfaces, and AI-assisted diagnostic tooling
0 views 0 saves 0 applications