about 4 hours ago
Principal Site Reliability & Forward Deployed Engineer
Remote - USA
full-timesenior RemoteHealthcare IT
Tech Stack
Description
You will own the most complex production issues across our platform, blending SRE, forward-deployed engineering, and hands-on software development. You'll lead technical incident response, drive root cause analysis, work directly with strategic customers, and translate learnings into durable improvements to reliability and operability.
Requirements
- 10+ years of experience in software engineering, SRE, sustaining engineering, or production operations.
- Deep hands-on experience operating production systems in AWS.
- Strong experience troubleshooting Databricks and large-scale data platforms.
- Proficiency in Python and experience building production services or tooling.
- Strong understanding of distributed systems, incident management and RCA practices, monitoring, alerting, and observability.
- CI/CD Pipelines that leverage Infrastructure as Code.
- Proven ability to own problems end-to-end.
- Excellent communication skills, especially during incidents and customer escalations.
- Ability to work backward from customer impact to root cause.
- Strong instinct for operational risk.
Responsibilities
- Act as a senior technical escalation point during production incidents.
- Lead real-time incident triage, mitigation, and recovery efforts.
- Drive root cause analysis (RCA) with a focus on systemic, long-term fixes.
- Own post-launch reliability, stability, and operational quality of core systems.
- Investigate and resolve complex field issues and production defects.
- Engage directly with strategic customers to solve real-world, production-grade technical challenges.
- Support complex deployments, integrations, and escalations in customer environments.
- Serve as a subject matter expert for AWS-hosted production systems.
- Write production-quality code to automate operational workflows and improve reliability.
- Provide technical leadership without formal authority.
0 views 0 saves 0 applications