about 4 hours ago
Sr. Data Engineer
Pune, India
full-timesenior Hybridhealthcare
Tech Stack
Description
Design and implement high-volume batch and real-time data pipelines using PySpark, SparkSQL, and Databricks Workflows in a modern cloud environment. You will work directly with customers and internal teams to optimize data ingestion and transformation, enabling healthcare data to be clean, connected, and reliable for better decisions.
Requirements
- Bachelor's degree in Computer Science, Computer Engineering, or related technical field.
- 5+ years hands-on experience as a Data Engineer with large-scale distributed data processing in cloud environments.
- Working knowledge of U.S. healthcare data domains (claims, eligibility, provider datasets).
- Strong ability to communicate complex technical concepts to technical and non-technical stakeholders.
- Expert-level proficiency in Python, SQL, and PySpark.
- Experience building production-grade ETL/ELT pipelines with Databricks, Airflow, or similar workflow tools.
- Experience with dbt, Kafka, Delta Lake, and event-driven/streaming architectures in a cloud-native environment.
- Experience with structured and semi-structured data formats (Parquet, ORC, JSON, Avro) including schema evolution.
- Strong working knowledge of AWS data ecosystem (S3, SQS, Lambda, Glue, IAM) or equivalent cloud technologies.
- Proficiency with Terraform, infrastructure-as-code, and modern CI/CD pipelines (e.g., GitLab).
- Deep expertise in SQL and compute optimization (Z-Ordering, clustering, partitioning, pruning, caching).
- Hands-on experience with Snowflake (preferred), BigQuery, or Redshift including performance tuning and data modeling.
Responsibilities
- Design and implement high-volume batch and real-time data pipelines using PySpark, SparkSQL, Databricks Workflows, and distributed processing frameworks.
- Build end-to-end ingestion frameworks integrating with Databricks, Snowflake, AWS services (S3, SQS, Lambda), and vendor data APIs.
- Develop data modeling frameworks (star/snowflake schemas) and optimization techniques for cloud data warehouses.
- Lead technical solution design for health plan clients, creating highly available, fault-tolerant architectures across multi-account AWS environments.
- Translate complex business requirements into detailed technical specifications and reusable components.
- Implement security automation (RBAC, encryption, PHI handling, tokenization, auditing, HIPAA/SOC 2 compliance).
- Establish data engineering best practices (CI/CD, code versioning, automated testing, orchestration, logging, observability).
- Conduct performance profiling and optimize compute costs, cluster configurations, partitions, indexing, caching across Databricks and Snowflake.
- Produce technical documentation (runbooks, architecture diagrams, operational standards).
- Mentor junior engineers through reviews, coaching, and training.
0 views 0 saves 0 applications