Senior Site Reliability Engineer at Jobgether

7h ago

Senior Site Reliability Engineer

$149.1k-$157.8k / year

full-timesenior Remoteai-ml

🛠 Tech Stack

💼 About This Role

You'll lead reliability initiatives for a cutting-edge AI-driven platform, designing resilient infrastructure for AI pipelines. You'll shape observability, automation, and developer enablement to ensure scalable, reliable services. This fully remote role offers strong technical ownership and leadership influence.

🎯 What You'll Do

Define SLIs, SLOs, and error budgets for production services and AI workloads
Design resilient infrastructure patterns for AI pipelines and observability
Lead incident response, disaster recovery, and post-incident reviews for long-term improvements
Develop and maintain observability solutions using monitoring and tracing tools
Manage infrastructure as code and automation strategies to improve operational efficiency

📋 Requirements

6–8 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps
Deep expertise with AWS, Kubernetes, Docker, Terraform, and GitOps
Strong experience with observability platforms and distributed tracing
Proficiency in Python and/or Bash scripting

✨ Nice to Have

Experience with Internal Developer Platform tools like Backstage
Experience supporting AI/ML infrastructure, LLM integrations, or agentic systems
Experience with FinOps, disaster recovery, policy-as-code, or regulated environments

🎁 Benefits & Perks

💰 Competitive salary ($149,100 – $157,800)
🏥 Medical, dental, and vision coverage
🏖️ Flexible vacation policy
📚 Company-sponsored training and professional development
🧘 Annual wellness and fitness reimbursement

📨 Hiring Process

Estimated timeline: 2-4 weeks · AI estimate

1Recruiter screen· 30 min
2Technical interview· 60 min
3Hiring manager interview· 45 min

Jobgether

Job openings at Jobgether

Other jobs at Jobgether

No other jobs found.

0 0 0