6 days ago
DevOps - Platform Engineer
United States
full-timesenior RemoteHealthcare Technology
Tech Stack
Description
You'll be a core contributor to our infrastructure, owning the systems that keep August Health fast, secure, and resilient as we scale. This is a high-autonomy, high-impact role where you'll work closely with our engineering team to shape how we build, deploy, and operate software with real influence over architecture decisions and engineering culture.
Requirements
- Strong hands-on experience with AWS — particularly EKS, Cognito, Aurora, RDS, Lambda, and VPC; you can make smart tradeoff decisions across services and know when to reach for each
- Proficiency with Kubernetes in production — you've operated clusters at scale and know how to debug when things go wrong
- Experience with infrastructure as code, ideally Pulumi or a similar tool (Terraform, CDK)
- Comfort with GitHub Actions or similar CI/CD systems — you've built and optimized pipelines, not just used them
- A security-minded approach — you think about least privilege, secrets management, and compliance by default; experience working toward or maintaining SOC 2 and/or HIPAA compliance is important, not just a nice-to-have
- Solid observability experience — you're comfortable with Prometheus, have instrumented backend services before, and can look at an existing metrics setup and form a point of view on what's missing or misleading
- Familiarity with data pipeline infrastructure, including tools like Snowflake and Apache NiFi
- Strong communication skills — you can explain infrastructure decisions to non-infrastructure engineers, and you write good documentation
- Self-direction — you can identify what needs doing, prioritize well, and drive projects to completion without heavy oversight
Responsibilities
- Infrastructure as code — managing and evolving our AWS infrastructure using Pulumi, with a focus on reliability, cost efficiency, and maintainability
- Kubernetes platform — operating and improving our K8s clusters: workload scheduling, resource management, networking, and observability
- CI/CD pipelines — owning and optimizing our GitHub Actions workflows to keep builds fast, feedback tight, and deployments safe
- Security & compliance — hardening our infrastructure posture, supporting audit readiness, and implementing controls that meet the requirements of operating in healthcare
- Data pipeline infrastructure — supporting the reliable operation of our data engineering workflows
- LLM tooling — deploying and maintaining prompt tracing, evaluation, and observability tools as we integrate AI capabilities into our product
- Network & access — managing secure, zero-trust connectivity via Tailscale across our distributed infrastructure
- Disaster recovery & incident response — designing, documenting, and regularly testing DR/IR processes so we're always ready
0 views 0 saves 0 applications