3h ago
Senior Engineer, Network Observability
London, England
full-timeseniorcloud computing
Tech Stack
Description
You will design, develop, and maintain monitoring, telemetry, and observability systems for CoreWeave's GPU cloud network in London. You'll build solutions that provide real-time insights into network performance, enabling proactive issue detection and rapid resolution to ensure reliable operation at scale.
Requirements
- Deep familiarity with Prometheus, Grafana, Alertmanager, gNMI, and SNMP
- Experience as a Network Engineer, SRE, Software Developer, or Systems Administrator in large-scale environments
- Proficient with Python, Go, and Bash
- Comfortable containerizing solutions in Kubernetes
- Strong knowledge of Linux systems and IP networking concepts
Responsibilities
- Develop, optimize, and maintain network observability platforms
- Create and automate collectors, exporters, and dashboards using Python and Golang
- Ingest and unify logs, metrics, and events from Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, SR Linux into a single pipeline
- Design and implement scalable telemetry solutions using gNMI, SNMP, and streaming analytics
- Ensure advanced alerting and anomaly detection with Prometheus, Grafana, and Alertmanager
0 views 0 saves 0 applications