1h ago

Staff Site Reliability Engineer - Data Center

Boston, MA

$165,750-$224,450 / year

full-timesenior RemoteHealthcare AI

Tech Stack

Description

You will design, build, and operate PathAI's hybrid cloud/on-prem environment to support AI-powered pathology. Implement SRE best practices, automate infrastructure, and improve reliability for critical production systems that accelerate drug development and patient diagnosis.

Requirements

  • 8+ years of relevant experience
  • Automation with scripting, Ansible, Python/GoLang
  • Experience with observability tools (Datadog/Grafana/Prometheus)
  • Infrastructure as Code (Terraform/Cloudformation)
  • Administer physical hardware (iDRAC/IPMI/Nvidia UFM/Juniper Systems)

Responsibilities

  • Implement SRE best practices focusing on users, monitoring, and automation
  • Engineer infrastructure patterns for AWS with security, reliability, and scalability
  • Design, build, and operate datacenter for Machine Learning team
  • Integrate on-premises datacenter with cloud infrastructure for seamless hybrid cloud
  • Participate in on-call rotations and incident response
0 views 0 saves 0 applications