about 2 hours ago
Staff Software Engineer, GraphQL
Remote - USA
$204,000-$255,000 / year
full-timesenior Remotetravel and hospitality
Tech Stack
Description
You will join the Viaduct team to build and evolve a GraphQL platform that handles over 70% of Airbnb's API traffic, focusing on reliability, developer experience, and open-source contributions. You'll design observability tooling, SLO frameworks, and AI-powered incident response to maintain 99.99% uptime, while architecting the next-generation platform.
Requirements
- 9+ years of software engineering experience with significant depth in backend systems, distributed architectures, and platform engineering
- Deep expertise in observability and monitoring, including experience designing SLO frameworks, distributed tracing systems, and metrics pipelines at scale
- Proven track record in reliability engineering with hands-on experience in incident response, root cause analysis, and building systems that maintain high availability (99.99%+)
- Strong experience with performance tuning and resource management in JVM-based systems, including profiling, garbage collection optimization, and understanding of concurrency models
- Experience operating critical, high-traffic systems with a focus on deployment safety, automated rollbacks, and progressive delivery strategies
- Familiarity with GraphQL or similar API gateway/data access layer technologies
- Experience building developer tooling and platforms with a product mindset
- Strong leadership and communication skills
Responsibilities
- Drive platform reliability and operational excellence by designing and implementing deployment pipelines, SLO frameworks, observability tooling, performance improvements, and AI-enabled incident response automation
- Contribute to runtime resiliency initiatives including resource attribution, performance regression testing, and proactive monitoring
- Architect and deliver AI-powered operational tooling that accelerates incident triage and reduces mean-time-to-mitigation
- Shape the future of Viaduct Modern by contributing to the next-generation architecture
- Investigate and resolve complex production issues
- Design and implement observability features including span instrumentation, SLO dashboards, and fine-grained attribution
- Develop and iterate on tooling for deployment triage, service health monitoring, and incident response automation using LLM capabilities
- Lead technical design discussions and RFCs for performance regression testing pipelines, emergency deployment workflows, and runtime resiliency improvements
- Partner with tenant teams to debug performance issues and provide guidance on GraphQL best practices
- Contribute to open-source Viaduct
0 views 0 saves 0 applications