about 2 hours ago

Staff Software Engineer, GraphQL

Remote - USA

$204,000-$255,000 / year

full-timesenior Remotetravel and hospitality

Tech Stack

Description

You will join the Viaduct team to build and evolve a GraphQL platform that handles over 70% of Airbnb's API traffic, focusing on reliability, developer experience, and open-source contributions. You'll design observability tooling, SLO frameworks, and AI-powered incident response to maintain 99.99% uptime, while architecting the next-generation platform.

Requirements

  • 9+ years of software engineering experience with significant depth in backend systems, distributed architectures, and platform engineering
  • Deep expertise in observability and monitoring, including experience designing SLO frameworks, distributed tracing systems, and metrics pipelines at scale
  • Proven track record in reliability engineering with hands-on experience in incident response, root cause analysis, and building systems that maintain high availability (99.99%+)
  • Strong experience with performance tuning and resource management in JVM-based systems, including profiling, garbage collection optimization, and understanding of concurrency models
  • Experience operating critical, high-traffic systems with a focus on deployment safety, automated rollbacks, and progressive delivery strategies
  • Familiarity with GraphQL or similar API gateway/data access layer technologies
  • Experience building developer tooling and platforms with a product mindset
  • Strong leadership and communication skills

Responsibilities

  • Drive platform reliability and operational excellence by designing and implementing deployment pipelines, SLO frameworks, observability tooling, performance improvements, and AI-enabled incident response automation
  • Contribute to runtime resiliency initiatives including resource attribution, performance regression testing, and proactive monitoring
  • Architect and deliver AI-powered operational tooling that accelerates incident triage and reduces mean-time-to-mitigation
  • Shape the future of Viaduct Modern by contributing to the next-generation architecture
  • Investigate and resolve complex production issues
  • Design and implement observability features including span instrumentation, SLO dashboards, and fine-grained attribution
  • Develop and iterate on tooling for deployment triage, service health monitoring, and incident response automation using LLM capabilities
  • Lead technical design discussions and RFCs for performance regression testing pipelines, emergency deployment workflows, and runtime resiliency improvements
  • Partner with tenant teams to debug performance issues and provide guidance on GraphQL best practices
  • Contribute to open-source Viaduct
0 views 0 saves 0 applications