3h ago

Senior Engineer, Network Observability

London, England
full-timeseniorcloud computing

Tech Stack

Description

You will design, develop, and maintain monitoring, telemetry, and observability systems for CoreWeave's GPU cloud network in London. You'll build solutions that provide real-time insights into network performance, enabling proactive issue detection and rapid resolution to ensure reliable operation at scale.

Requirements

  • Deep familiarity with Prometheus, Grafana, Alertmanager, gNMI, and SNMP
  • Experience as a Network Engineer, SRE, Software Developer, or Systems Administrator in large-scale environments
  • Proficient with Python, Go, and Bash
  • Comfortable containerizing solutions in Kubernetes
  • Strong knowledge of Linux systems and IP networking concepts

Responsibilities

  • Develop, optimize, and maintain network observability platforms
  • Create and automate collectors, exporters, and dashboards using Python and Golang
  • Ingest and unify logs, metrics, and events from Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, SR Linux into a single pipeline
  • Design and implement scalable telemetry solutions using gNMI, SNMP, and streaming analytics
  • Ensure advanced alerting and anomaly detection with Prometheus, Grafana, and Alertmanager
0 views 0 saves 0 applications