AI/ML Data Engineer

Synechron

United Arab Emirates, Dubai

5-7 Years

Save

Posted 3 days ago
Be among the first 10 applicants

Early Applicant

Job Description

Greetings,

We have an immediate vacancy for an AI/ML Data Engineer Unstructured Data & LLM Integration with over 7 years of experience at Synechron, based in Dubai.

Job Role: AI/ML Data Engineer Unstructured Data & LLM Integration

Location: Dubai

About Company:

At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron's progressive technologies and optimization strategies span end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering, servicing an array of noteworthy financial services and technology firms. Through research and development initiatives in our FinLabs we develop solutions for modernization, from Artificial Intelligence and Blockchain to Data Science models, Digital Underwriting, mobile-first applications and more. Over the last 20+ years, our company has been honored with multiple employer awards, recognizing our commitment to our talented teams. With top clients to boast about, Synechron has a global workforce of 17000+, and has 40 offices in 21 countries within key global markets. For more information on the company, please visit our website or LinkedIn community.

Diversity, Equity, and Inclusion

Synechron's Diversity, Equity, and Inclusion (DEI) program, Same Difference, was developed because we believe in a culture of listening, respect, and opportunity.

We each bring unique backgrounds, thoughts, talents, and experiences with us to work every day, and we know that by embracing them, we are creating an even greater Synechron. The best way to build a strong team is to value individual differences. So, it doesn't matter where you're from or what you've had to do to get here if you have the skills, enthusiasm, and drive to make your mark, we'll support you like we support each other. Choose a career with us and let's pursue innovation, together.

Job Descriptions:

Education

Degree, Post graduate in Computer Science or related field (or equivalent industry experience).

Job Summary:

We are looking for an AI/ML-focused Data Engineer who brings deep expertise in building intelligent data pipelines for unstructured content and is experienced in integrating with modern machine learning ecosystems. The ideal candidate will have hands-on experience in PySpark and Python, with a strong focus on document classification, cleansing, quality metrics, and the ability to work with LLMs, vector databases, and Retrieval-Augmented Generation (RAG) frameworks. Candidates will play a critical role in bridging data engineering and machine learning, enabling the development of AI-first applications across the enterprise.

Key Responsibilities:

Build robust, scalable data processing pipelines for unstructured documents (PDFs, emails, forms, etc.) using PySpark and Python.
Implement document cleansing, classification, and enrichment techniques to prepare high-quality data for AI/ML applications.
Develop and integrate data workflows that feed into LLM-based pipelines and support vector-based retrieval using RAG architectures.
Engineer vector embeddings, document chunking, and metadata tagging for semantic search and question-answering systems.
Collaborate closely with AI architect, AI/Data engineers, and platform teams to design end-to-end AI solutions.
Communicate data readiness, pipeline quality, and model integration strategies clearly to both technical and non-technical stakeholders.
Apply Agile methodologies and CI/CD best practices to deliver continuously evolving AI capabilities.

Required Skills:

Overall 5+ years of commercial experience with 2+ years in relevant role
Strong proficiency in PySpark and distributed data frameworks.
Solid experience in core Python, including ML/AI libraries (e.g., Transformers, LangChain, Hugging Face, FAISS, etc.).
Proven expertise in processing unstructured data and document intelligence (OCR, NLP, classification, tagging).
Familiarity with vector databases (e.g., Redis) and embedding models for RAG pipelines.
Understanding of LLM lifecycle, including fine-tuning, inference, and prompt engineering.
Experience working in agile environments, collaborating with cross-functional teams.
Excellent communication skills with the ability to interface with both technical and business stakeholders.