
Search by job, company or skills

Hiring: Data Engineer – PySpark | Cloudera CDP | Informatica BDM
Location:
• Onsite – Dubai (Up to 23K AED)
• Offshore – Bangalore / Chennai (Up to 37 LPA)
Experience: 5+ Years
Notice Period: Immediate / Serving / Max 30 Days
We are hiring an experienced Data Engineer with strong hands-on expertise in PySpark, Cloudera Data Platform (CDP), and Informatica Big Data Management (BDM) to build and support enterprise-scale big data solutions.
Key Responsibilities
• Design, develop, and maintain optimized ETL pipelines using PySpark on CDP
• Implement data ingestion from multiple sources (databases, APIs, filesystems)
• Perform Spark and CDP performance tuning for large-scale workloads
• Build and enforce data quality checks, validation, and monitoring
• Automate workflows using Oozie / Airflow
• Develop and maintain Informatica BDM mappings and workflows
• Ensure security, compliance, and stability of data pipelines
• Collaborate with cross-functional teams to support data-driven initiatives
Technical Skills
• Advanced PySpark (RDDs, DataFrames, optimization techniques)
• Strong experience with Cloudera Data Platform (CDP) – Hive, Impala, HDFS, HBase
• Hands-on experience in Informatica Big Data Management (BDM)
• Strong knowledge of Oozie scheduling, HQL, data partitioning
• Experience with SQL & NoSQL databases
• Exposure to Hadoop, Kafka and distributed systems
• Strong Linux shell scripting skills
• Understanding of security, compliance, and data governance frameworks
Preferred Experience
• Enterprise Banking / Financial / Fintech environments
• Agile methodology and CI/CD tools (Git, Jenkins, etc.)
• Experience working in production-grade distributed data ecosystems
Job Description for Informatica BDM:
Education
Experience
Minimum 4+ years of development and design experience in Informatica Big Data Management
Extensive knowledge on Oozie scheduling, HQL, Hive, HDFS (including usage of storage controllers) and data partitioning
Technical Skills
Extensive experience working with SQL and NoSQL databases
Linux OS configuration and use, including shell scripting.
Good hands on experience with design patterns and their implementation.
Well versed with Agile, DevOps and CI/CD principles (GitHub, Jenkins etc.), and actively involved in solving, troubleshooting issues in distributed services ecosystem
Familiar with Distributed services resiliency and monitoring in a production environment.
Experience in designing, building, testing and implementing security systems – including identifying security design gaps in existing and proposed architectures and recommend changes or enhancements.
Responsible for adhering to established policies, following best practices, developing and possessing an in-depth understanding of exploits and vulnerabilities, resolving issues by taking the appropriate corrective action.
Knowledge on security controls designing Source and Data Transfers including CRON, ETLs, and JDBC-ODBC scripts.
Understand basics of Networking including DNS, Proxy, ACL, Policy and troubleshooting
High level knowledge of compliance and regulatory requirements of data including but not limited to encryption, anonymization, data integrity, policy control features in large scale infrastructures
Understand data sensitivity in terms of logging, events and in memory data storage– such as no card numbers or personally identifiable data in logs.
Implements wrapper solutions for new/existing components with no/minimal security controls to ensure compliance to bank standards.
Job ID: 137862741
Skills:
Power Bi, Pyspark, PostgreSQL, SQL Server, Sql, ELT, MongoDB, Cosmos DB, Azure, Oracle, Python, AWS, Etl, Airflow, Microsoft Purview
Skills:
data engineering , snowflake , Sql, ETL/ELT, AI/ML
Skills:
Cassandra, PostgreSQL, Kafka, Docker, MySQL, Python, AWS, Java, Graphql, Hadoop, Scala, Bash, HBase, Sql, Redis, Gcp, Spark, MongoDB, Restful Apis, Azure, Kubernetes, Image, Flink, Text, Video, Audio
Skills:
Etl Tools, Sql, AWS, Azure, NoSQL databases, cloud platforms
Skills:
Sql, Python, Hadoop Ecosystem, Apache Spark, Git, Pyspark, Machine Learning pipelines, feature engineering, ETL frameworks
We don’t charge any money for job offers