
Search by job, company or skills
Principal Data Engineer:
Experience: 9+ Years
Work Mode: Onsite
Location: Bangalore
Principal Data Platform Engineer
Architecture: Lakehouse (Medallion: Bronze/Silver/Gold)
Compute: Apache Spark (Expert level)
Storage/Table Format: Delta Lake (Required), Iceberg (Strong Plus)
Transformation: dbt (Expert level)
Orchestration: Airflow, Cosmos
Infrastructure: Cloud-native (GCP preferred) + Databricks/Commercial tooling
Patterns: Microservices, Event-driven, CI/CD, IaC (Terraform)
Shape
Core Technical Requirements
1. Data Engineering & Spark Internals
Deep Spark: You must understand RDDs, DataFrames, Spark SQL, and internals (Shuffle,
Partitioning, Memory Management, Catalyst Optimizer).
Pipeline Mastery: Building idempotent, self-healing ELT/ETL pipelines. Experience with Schema
Evolution and handling late-arriving data.
Lakehouse ACID: Expert knowledge of transaction logs, time travel, and file compaction in
Delta/Iceberg.
2. Software Architecture & Design
Engineering First: This isn't just SQL and scripts. You apply SOLID principles, design patterns,
and write production-grade Python/Scala/Java.
Integration: Experience building and consuming Microservices. Knowledge of API design
(REST/gRPC) and message brokers (Kafka/PubSub).
System Design: Experience building a platform from scratch. You know how to design for 99.9%
availability and horizontal scalability.
3. Data Modeling & dbt
Modeling: Expert in dimensional modeling (Kimball), Data Vault 2.0, or OBT (One Big Table) for
high-performance analytics.
dbt Power User: Advanced dbt usage (Macros, Packages, Custom Tests, dbt Mesh). You treat
dbt projects like software repositories (version control, PR reviews, CI).
4. Cloud & Platform
Cloud Native: Deep understanding of IAM, VPCs, Object Storage, and serverless compute.
Migrations: Proven track record of moving petabyte-scale data from legacy systems (On-prem,
Redshift, Snowflake) to a Lakehouse without data loss.
Shape
Key Deliverables (First 6-12 Months)
Platform Zero: Evaluate, select, and deploy the foundational Lakehouse infrastructure.
Core Frameworks: Build the reusable libraries/templates for the rest of the engineering team to
build pipelines.
Legacy Decommission: Design the technical map to migrate all high-priority finance/business
data to the new stack.
Performance Baseline: Optimize Spark/Cloud costs by at least 20% through better resource
management.
Shape
The Plus List
MLOps: Building feature stores and model deployment triggers.
GCP Specialization: BigQuery (as a Lakehouse layer), Dataproc, and Cloud Composer.
Observability: Implementing Data Quality monitoring (Great Expectations, Monte Carlo) and
OpenTelemetry.
Job ID: 146585763
Skills:
snowflake , Github, Terraform, Gitlab, Qualys, AWS, Airflow, dbt, Nessus, Fivetran
Skills:
Java, Hive, Hadoop, Scala, Spark, Kafka, Kubernetes, Python, Trino, Flink
Skills:
compaction , ECS, Typescript, Kafka, Cloudformation, Terraform, S3, Data Modeling, Data Quality, AWS, Kubernetes, Python, Docker, Spark, Data serving layers, Flink, Streaming pipelines, Data lakes, Schema evolution, Access Control, Pipeline monitoring, Partitioning, Data ingestion frameworks, Governance, Go, Data isolation, Lakehouse architecture, Apache Iceberg, Data observability
Skills:
snowflake , Apache Spark, Sql, Django, Azure Functions, Postgres, Databricks, FastAPI, Python, Pytest, Azure Blob Storage, Ray, Prefect, GitHub Actions, dbt, Metaplane
We don’t charge any money for job offers