about 4 hours ago
Senior Data Engineer
Nepal
full-timeseniorhealthcare technology
Tech Stack
Description
You will architect and implement high-volume batch and real-time data pipelines using PySpark, Databricks, and AWS in a modern cloud environment. Working within the TechOps division, you will design data integration solutions that enable reliable data ingestion and transformation for health plan data, directly supporting better healthcare decisions.
Requirements
- Bachelor's degree in Computer Science, Computer Engineering, or a closely related technical field.
- 5+ years of hands-on experience as a Data Engineer building and operating large-scale, distributed data systems in modern cloud environments.
- Proven ability to clearly communicate complex technical concepts and solutions to both technical and non-technical stakeholders.
- Expert-level proficiency in Python, SQL, and PySpark, including development of distributed transformations and performance-optimized queries.
- Demonstrated experience designing, building, and operating ETL/ELT pipelines using Databricks, Airflow, or similar orchestration and workflow automation tools.
- Proven experience architecting or operating large-scale data platforms using DBT, Kafka, Delta Lake, and event-driven or streaming architectures in cloud-native data or platform engineering environments.
- Strong working knowledge of AWS data services (S3, SQS, Lambda, Glue, IAM or equivalents), structured and semi-structured data formats (Parquet, ORC, JSON, Avro), schema evolution, and optimization techniques.
- Hands-on experience with Terraform and CI/CD pipelines (e.g., GitLab), deep expertise in SQL and compute optimization (partitioning, clustering, Z-Ordering, pruning, caching), and performance tuning on cloud data warehouses such as Snowflake (preferred), BigQuery, or Redshift.
Responsibilities
- Architect, design, and implement high-volume batch and real-time data pipelines using PySpark, SparkSQL, Databricks Workflows, and distributed processing frameworks.
- Build end-to-end ingestion frameworks integrating Databricks, Snowflake, AWS services (S3, SQS, Lambda), and vendor APIs, ensuring data quality, lineage, and schema evolution.
- Design and optimize data models (star/snowflake schemas) and apply performance tuning techniques for analytical workloads on cloud data warehouses.
- Translate complex business requirements into detailed technical specifications, reusable engineering components, and implementation artifacts.
- Establish and enforce data engineering best practices, including CI/CD for data pipelines, version control, automated testing, orchestration, logging, and observability.
- Drive performance and cost optimization through profiling, cluster tuning, partitioning, indexing, caching, and compute optimization across Databricks and Snowflake.
- Ensure operational excellence and team growth by producing high-quality documentation (runbooks, architecture diagrams), monitoring and troubleshooting production pipelines, performing root-cause analysis, and mentoring junior engineers.
0 views 0 saves 0 applications