Designs, builds, and optimizes scalable data infrastructure and pipelines to process and manage both structured and unstructured data, powering advanced AI solutions
Key Responsibilities
Design, implement, and maintain robust ETL/ELT processes to ingest, clean, and transform data from multiple sources
Process and manage complex datasets to support business and analytical needs
Conduct regular testing and enhancement of data pipelines to improve efficiency
Implement best practices for data validation, testing, and monitoring, proactively identifying and resolving issues to maintain data integrity
Collaborate with software engineers and AI/ ML engineers to ensure seamless data accessibility
Implement best practices for data governance, security, and quality assurance
Document data architectures, lineage, and metadata to ensure transparency and reproducibility
Qualifications
Additional information
5+ years of experience in data engineering, ETL development, or data warehousing
Demonstrated ability to design and manage ETL/ELT processes using tools like Databricks, Airflow, Luigi, or dbt to automate data workflows and transformations
Experience with big data architectures and optimizing data pipelines
Deep familiarity with SQL for data manipulation and querying
Strong coding skills in Python, and at least one other language (Scala/Java)
Hands-on experience with cloud-based data platforms (Azure/AWS/GCP)
Strong analytical skills for working with large-scale unstructured datasets
Ability to diagnose data issues, optimize performance, and communicate technical solutions effectively to both technical and non-technical stakeholders
Educational qualifications: Bachelor's degree in Computer Science, Engineering, or related field required