6h ago
Site Reliability Engineer
Lisbon
$55k-$68k / year
full-timemid Remotesoftware Visa Sponsor
๐ Tech Stack
+3
๐ผ About This Role
You'll ensure the reliability, availability, and scalability of systems at a fast-growing AI company. You'll implement automation, monitoring, and performance optimization strategies to minimize downtime and improve resilience. This onsite role in Lisbon includes relocation support.
๐ฏ What You'll Do
- Design scalable, reliable, and fault-tolerant systems
- Develop observability tools (Prometheus, Grafana, Datadog, ELK)
- Automate infrastructure provisioning and incident response with IaC
- Optimize system performance and incident response workflows
๐ Requirements
- 4+ years in SRE, DevOps, or System Engineering
- Strong knowledge of cloud platforms (AWS, Azure, GCP)
- Experience with observability tools (Prometheus, Grafana, ELK, Datadog)
- Proficiency in Infrastructure as Code (Terraform, CloudFormation)
โจ Nice to Have
- Hands-on experience with containerization and orchestration
- Knowledge of security best practices and compliance
- Experience with incident management and root cause analysis
๐ Benefits & Perks
- ๐ Apple hardware ecosystem for work
- ๐ฐ Annual Bonus
- ๐ฅ Top-tier Health and Life Insurance
- ๐ Transportation Budget
- ๐ณ Coverflex benefits package
๐จ Hiring Process
Estimated timeline: 2-4 weeks ยท AI estimate
- 1Recruiter Screenยท 30 min
- 2Technical Interviewยท 60 min
- 3System Design Interviewยท 60 min
- 4Hiring Manager Interviewยท 45 min
- 5Reference Checkยท 15 min
๐ฉ Heads Up
- Requirement for no AI assistance in application may deter some candidates
- Vague company description without specific product details
0 0 0