6 days ago

Machine Learning Ops Engineer

Singapore, Central, Singapore
seniorFintech

Tech Stack

Description

You will architect and orchestrate a seamless multi-cloud environment, manage the AI tech stack, and ensure robust DataOps pipelines. You will champion finance operations for ML and LLM systems, securing the platform and implementing continuous monitoring to support Data Scientists and AI Engineers.

Requirements

  • 5+ years of technical experience, with a proven track record of shipping ML pipelines in production
  • Multi-Cloud Fluency: Deep expertise in architecting solutions on major cloud platforms (e.g. AWS, GCP)
  • Strong operational grasp of cloud services (e.g. Security, Networking, Storage, AI)
  • Experience in LLM Observability Cost Optimisation: Experience setting up stacks with self-hosted tools (e.g. Langfuse, LangSmith, Phoenix)
  • Ability to implement caching strategies (e.g. Redis / Memcached)
  • Certifications: Google Professional Machine Learning Engineer or AWS Certified Machine Learning - Specialty / DevOps Engineer - Professional certification
  • Holding a Bachelor’s degree in Computer Science, Engineering, or related fields
  • Expert in Infrastructure as Code (IaC): Mastery of IaC (e.g. Terraform, OpenTofu)
  • Experience writing modular, reusable code for multi-environment setups (Dev / Staging / Prod)
  • Proficient in DataOps: Proven implementation of Medallion Architecture on a Data Lakehouse
  • Proficiency with Apache Airflow (writing custom operators), with data quality tools like dbt tests, and with data governance tools (e.g. OpenMetadata)
  • Mastery of CI/CD Automation: Advanced configuration of GitLab CI (e.g. Runners, Secrets Management)
  • Experience with CML (Continuous Machine Learning) is a plus
  • Proficient in Containerisation: Mastery of Docker, Kubernetes and orchestration (e.g. VM, K8s)
  • Passionate about cost management and efficiency: You view efficiency as a dual mandate, optimising financial costs while maximising system performance

Responsibilities

  • Architect and orchestrate a seamless multi-cloud environment
  • Manage the AI tech stack and systems alongside the enterprise data infrastructure using Terraform
  • Design and maintain robust DataOps pipelines implementing Medallion Architecture (Bronze / Silver / Gold)
  • Use Airflow to orchestrate DAGs and ensure data quality / lineage before it reaches the models
  • Ensure excellence in the MLOps lifecycle by implementing the "4 C's": CI (Automated linting/testing in GitLab), CD (Safe rollout strategies), CT (Automated retraining triggers), and CM (Continuous Monitoring of drift / latency)
  • Champion Finance operations (cost and efficiency) for ML and LLM systems
  • Implementing approaches to prevent redundant API calls and scripting automated "Kill Switches" for runaway GPU instances or token spikes
  • Secure the platform by architecting services to allow our team to access different resources securely from different environments, managing IAM Identity Center for least-privilege access
  • Participate in the evaluation of observability tools to trace token usage, error rates per users and other other measures
0 views 0 saves 0 applications