6 days ago

Platform & Reliability Engineer

Singapore
full-timeseniorfintech Visa Sponsor

Tech Stack

Description

You'll be the architect of how we deploy, scale, and operate systems to keep payments flowing smoothly, owning reliability and performance from designing SLOs to optimizing costs and developer workflows. As the first platform-focused hire at a fast-growing fintech startup, you'll define best practices, shape engineering culture, and ensure we ship software that is fast, secure, and resilient at scale.

Requirements

  • At least 5+ years of experience building/operating production systems at scale, ideally on Google Cloud or a similar serverless stack, ideally in fast-paced or startup settings
  • Hands-on Fluency with Firebase, Cloud Build, Cloud Run/Functions, Pub/Sub, Cloud SQL/Spanner, VPC Service Controls
  • Strong coding in Python or Go for automation, with an eye on maintainability
  • Demonstrated record of driving observability, on-call and cost optimisation in a fast-moving environment
  • Excellent collaboration and communication skills to work effectively with cross-functional teams
  • Experience in payments, PCI-DSS, or crypto settlement flows is a bonus
  • 5+ years of experience in platform, reliability, or infrastructure engineering
  • Proficient in GCP, infrastructure as code, and automation scripting
  • Familiar with observability, CI/CD, and modern cloud-native architectures
  • Bachelor's degree or equivalent practical experience

Responsibilities

  • Define, track, and evangelize latency and availability targets for our payment APIs
  • Deploy Cloud Monitoring, Cloud Trace, Error Reporting, and dashboards; integrate alerts via Incident.io and Slack for on-call
  • Establish blameless postmortems, guardrails, and runbooks to drive learning and prevent recurrence
  • Codify Cloud Build pipelines and automated canary rollouts for Cloud Functions / Cloud Run
  • Manage GCP resources; embed security, IAM least-privilege, and cost controls by default
  • Profile hot paths (BigQuery, Firestore, Pub/Sub), and implement caching or concurrency improvements to keep user latency 100 ms
  • Eliminate toil by improving local-to-prod parity, secrets management, and spinning up environments with a single command
  • Instill reliability thinking across engineering and product as the first platform-focused hire
0 views 0 saves 0 applications