Search by job, company or skills

  • Posted 9 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the role:

We are seeking an experienced data engineer to join our team building a scalable and secure health data platform. In this role, you will design, build and optimise data pipelines (batch & streaming) for big data systems. Extracting, Analysing and Modelling of rich & diverse health data sets.

Responsibilities:

Design and implement data pipelines, ETL processes, schemas, and data models to ingest, process, and prepare multi-petabyte scale datasets for downstream analytics and machine learning.

Build and optimize data processing systems on modern platforms like Azure Fabric/Databricks, Spark, Delta Lake, Azure Data Factory/Airflow, Purview, Kafka, etc.

Implement data quality, validation, and monitoring measures leveraging tools such as Great Expectations.

Ensure compliance with security, access control, and regulatory requirements related to PHI and other sensitive data types.

Support adoption of emerging standards like FHIR for healthcare data exchange.

Collaborate with data scientists, analysts, and engineers to understand data needs and deliver performant, reliable data products

Keep track of emerging technologies & trends in the Data Engineering world, incorporating modern tooling and best practices.

Qualifications:

4+ years experience building and operating production big data platforms and pipelines

Strong experience with SQL, Spark, workflow orchestrators(ADF/Airflow), distributed message bus, Python, Delta Lake, apache big data tool suites, Docker, Kubernetes, MPP

Hands on with the design and implementation of cloud-based data solutions using platforms like Azure, AWS, or GCP, optimizing for scalability, cost-efficiency, and performance.

Implement and maintain data lakes including data modeling, ETL processes, and data quality assurance to empower data-driven decisionmaking.

Develop real-time data pipelines using streaming technologies like Apache Kafka or AWS Event hub, enabling timely insights and actions from incoming data streams.

Previous experience of working on health data and Azure cloud is a strong plus

Experience with Databricks or MS Fabric

Strong track record of designing and implementing scalable data models, schemas, ETL logic

Experience with data governance, master data management, data pseudonimization and anonymization, and data catalog solutions .

A strong interest in learning new things and team player ethics.

Strong analytical skills and good understanding of data structures and algorithms.

Some exposure to Nextflow and or Nextflow Tower

Nice to have:

Experience building data pipelines for machine learning and working with unstructured datasets.

Knowledge of genomics, medical imaging, and/or EHR data domains

Knowledge of HIPAA, HL7 and other healthcare data privacy requirements.

Azure Batch & Blob Storage

More Info

Job Type:
Industry:
Employment Type:

Job ID: 146064673

Similar Jobs