about 4 hours ago

Senior Data Engineer

Kathmandu, Nepal
full-timeseniorhealthcare

Tech Stack

Description

As a Senior Data Engineer at Abacus Insights, you will architect and implement high-volume batch and real-time data pipelines in a modern cloud environment, enabling clean, connected healthcare data for GenAI use cases. You will work with Databricks, Snowflake, and AWS to build scalable ingestion frameworks, optimize data models, and enforce engineering best practices while mentoring junior team members.

Requirements

  • Bachelor’s degree in Computer Science, Computer Engineering, or a closely related technical field, with 5+ years of hands-on experience as a Data Engineer building and operating large-scale, distributed data systems in modern cloud environments.
  • Proven ability to clearly communicate complex technical concepts and solutions to both technical and non-technical stakeholders.
  • Expert-level proficiency in Python, SQL, and PySpark, including development of distributed transformations and performance-optimized queries.
  • Demonstrated experience designing, building, and operating ETL/ELT pipelines using Databricks, Airflow, or similar orchestration and workflow automation tools.
  • Proven experience architecting or operating large-scale data platforms using DBT, Kafka, Delta Lake, and event-driven or streaming architectures in cloud-native data or platform engineering environments.
  • Strong working knowledge of AWS data services (S3, SQS, Lambda, Glue, IAM or equivalents), structured and semi-structured data formats (Parquet, ORC, JSON, Avro), schema evolution, and optimization techniques.
  • Hands-on experience with Terraform and CI/CD pipelines (e.g., GitLab), deep expertise in SQL and compute optimization (partitioning, clustering, Z-Ordering, pruning, caching), and performance tuning on cloud data warehouses such as Snowflake (preferred), BigQuery, or Redshift.

Responsibilities

  • Architect, design, and implement high-volume batch and real-time data pipelines using PySpark, SparkSQL, Databricks Workflows, and distributed processing frameworks.
  • Build end-to-end ingestion frameworks integrating Databricks, Snowflake, AWS services (S3, SQS, Lambda), and vendor APIs, ensuring data quality, lineage, and schema evolution.
  • Design and optimize data models (star/snowflake schemas) and apply performance tuning techniques for analytical workloads on cloud data warehouses.
  • Translate complex business requirements into detailed technical specifications, reusable engineering components, and implementation artifacts.
  • Establish and enforce data engineering best practices, including CI/CD for data pipelines, version control, automated testing, orchestration, logging, and observability.
  • Drive performance and cost optimization through profiling, cluster tuning, partitioning, indexing, caching, and compute optimization across Databricks and Snowflake.
  • Ensure operational excellence and team growth by producing high-quality documentation, monitoring and troubleshooting production pipelines, performing root-cause analysis, and mentoring junior engineers.
0 views 0 saves 0 applications