We are looking for a skilled Data Engineer to design, build, and operate scalable data pipelines that power real-time processing and analytics. You will work on high-throughput data systems, ensuring reliability, performance, and maintainability across the data lifecycle from ingestion to storage and search.
This role requires strong experience in distributed systems, stream processing, and cloud-native data infrastructure.
Responsibilities
- Design and implement real-time and batch data pipelines
- Build and maintain scalable streaming systems
- Develop and optimize stream processing jobs
- Ensure reliable ingestion from multiple internal and external data sources
- Design event schemas and data contracts
- Implement data validation, transformation, and enrichment logic
- Optimize storage layouts and lifecycle management strategies
- Improve system observability (metrics, logging, alerting)
- Troubleshoot and resolve performance bottlenecks in distributed systems
- Implement retry, dead-letter, and replay mechanisms
- Ensure data quality, consistency, and governance
- Collaborate with Backend, DevOps, and Security teams
Required Qualifications
- Proficiency in Java, Python, and SQL; strong software engineering fundamentals
- Experience with distributed messaging systems (e.g., Apache Kafka) and stream processing frameworks (e.g., Apache Flink)
- Knowledge of event-time processing, windowing, state management, and handling out-of-order events
- Experience with cloud storage/data lakes, relational databases, and search/indexing engines (e.g., OpenSearch / Elasticsearch)
- Familiarity with cloud platforms (AWS preferred), IaC (Terraform), CI/CD, and containerization (Docker)
- Strong problem-solving skills and ability to design scalable, fault-tolerant data pipelines
Nice to Have
- Experience with workflow orchestration platforms
- Experience in security, log processing, or observability domains
- Schema management tools (Avro, Protobuf, Schema Registry)
- Data lake table formats (Iceberg, Hudi, Delta Lake)
- Experience with distributed query engines (Athena, Trino, Presto)
- Multi-tenant system design
- Cost optimization in large-scale cloud environments
Soft Skills
- Strong problem-solving and debugging skills in distributed systems
- Ownership mindset with attention to reliability and quality
- Clear communication and documentation skills
- Ability to work cross-functionally
- Comfort working in fast-paced environments
What We're Looking For
- 7+ years of experience in data engineering or distributed systems
- Strong fundamentals in system design and scalability
- Proven experience operating production-grade data platforms
- Ability to balance performance, cost, and reliability