As a
Data Engineer, your main goal is to
build and maintain the systems that process and store data for the Events & Exhibitions ecosystem. You will
take scattered, vendor-specific data (from registration systems, apps, marketing tools) and
transform it into a unified, AI-ready dataset using a
Medallion Architecture on Azure.
Think of it as
organizing raw data into a structured pipeline that's ready for analysis and machine learning.
Requirements
- Data Ingestion & API Integration (Bronze Layer)
- Build and manage robust ETL/ELT pipelines using Azure Data Factory to ingest data from 3rd-party vendors (REST APIs, Webhooks, SFTP)
- Ensure raw data is landed securely in Azure Data Lake Gen2 (Bronze Layer) without data loss
- Implement error-handling and logging to monitor the health of real-time and batch ingestion jobs
- Transformation & Modeling (Silver & Gold Layers)
- Utilize PySpark (Azure Databricks/Synapse) and SQL to clean, deduplicate, and standardize data in the Silver Layer
- Execute Identity Resolution logic to stitch together visitor and exhibitor profiles from multiple touchpoints into a Golden Record.
- Develop optimized data sets in the Gold Layer for high-performance reporting and predictive AI models
- Infrastructure & Performance Optimization
- Optimize SQL queries and Spark jobs to reduce Azure compute costs and minimize data latency
- Maintain the Data Dictionary and technical documentation to ensure the Engine Room logic is transparent and scalable
- Implement data masking and security protocols to ensure GDPR and internal compliance
- Business Enablement
- Support the Senior Data Manager in building the Semantic Layer that feeds our Power BI Data Window.
- Collaborate with the Events Tech team to troubleshoot data discrepancies between front-end apps and back-end tables
Technical Requirements
- Experience: 3-5 years in Data Engineering with a focus on Cloud environments
- Core Azure Stack: Proven expertise in Azure Data Factory, Azure Synapse Analytics, and Data Lake Gen2
- Coding: High proficiency in SQL (complex joins/optimizations) and Python/PySpark
- Architectural Knowledge: Practical experience with the Medallion Architecture (Bronze/Silver/Gold)
- Integration: Strong experience working with REST APIs and JSON/XML data formats