Job Description Full Stack Data Scientist (Azure AI Engineer)
Location: Dubai
Experience: 8+ years (Data Science / AI Engineering / Applied ML)
Job Type: Contract
Job Summary
We are looking for a highly capable Full Stack Data Scientist / Azure AI Engineer who can build end-to-end AI products: data + ML/DL/CV models + agentic workflows + APIs + UI + scalable deployment on Kubernetes (AKS). The role requires deep expertise in the Azure AI ecosystem (Azure Machine Learning, Azure AI Foundry, Azure AI Search) and strong hands-on experience building AI agents using LangChain, LangGraph, and/or Microsoft Agent Framework, with Langfuse for tracing, evaluation, and observability. The ideal candidate has shipped production systems with measurable business impact and can operate them reliably through strong MLOps/LLMOps practices.
Key Responsibilities
1) End-to-End AI Product Delivery
- Own delivery from problem definition architecture development deployment monitoring iterative improvements.
- Translate business needs into robust AI solutions with clear KPIs, timelines, and measurable outcomes.
- Build AI applications that are secure, scalable, maintainable, and production ready.
2) AI Agents & Agentic Workflows (Must-Have)
- Design, implement, and orchestrate AI agents capable of planning, tool use, function calling, retrieval, and multi-step execution.
- Build agent systems using:
- LangChain for tool/function orchestration, retrieval, and integrations
- LangGraph for stateful, multi-step, resilient agent workflows
- Microsoft Agent Framework for enterprise-grade agent patterns and integrations
- Implement agent patterns: routing, task decomposition, multi-agent collaboration, memory, verification, retries/fallbacks, and human-in-the-loop approvals.
- Apply security & safety: prompt-injection defenses, tool permissioning, grounding/citations, policy checks, and audit logs.
3) LLMOps / Observability / Evaluation (Langfuse)
- Implement Langfuse (or equivalent) for:
- prompt and trace logging, latency/cost monitoring
- dataset-based evaluation, regression testing, and quality gates
- feedback loops and continuous improvement of prompts/agents
- Establish evaluation frameworks for RAG/agents: retrieval metrics, answer quality, hallucination checks, and guardrail effectiveness.
4) Azure Machine Learning & MLOps (Must-Have)
- Build/operate ML workflows using Azure Machine Learning:
- training jobs, compute, environments, pipelines, MLflow tracking
- model registry and promotion, managed online endpoints
- Implement CI/CD for model + application releases and MLOps practices: versioning, reproducibility, automated testing, and retraining triggers.
5) Azure AI Foundry & Azure AI Search (Must-Have)
- Build GenAI solutions using Azure AI Foundry (prompt flows/orchestration, deployment integration, evaluation workflows).
- Implement RAG pipelines using Azure AI Search:
- ingestion/indexing of structured & unstructured data
- vector + hybrid search, semantic ranking (where applicable), filtering, and relevance tuning
- citations, metadata-based access control, and indexing automation
6) ML/DL & Computer Vision (Strong Requirement)
- Develop and deploy strong ML/DL solutions including Computer Vision:
- classification, detection, segmentation, OCR/document understanding, anomaly/defect detection
- Conduct experimentation, tuning, and optimization (performance, robustness, cost).
- Productionize CV pipelines with monitoring and continuous improvement.
7) Backend/API Engineering (FastAPI + Node.js)
- Build production APIs for models and agents using FastAPI (Python) (async, OpenAPI/Swagger, auth, middleware, validation).
- Build service orchestration and integrations using Node.js where appropriate.
- Implement secure API patterns: authentication/authorization (Azure AD/RBAC patterns), rate-limiting, caching, and error handling.
8) Frontend Engineering (React)
- Build modern UIs in React for AI applications (agent chat UI, dashboards, workflow screens).
- Support streaming responses, citations, session memory, feedback capture, and user analytics.
9) Kubernetes/AKS Deployment & Operations
- Containerize services using Docker and deploy on Kubernetes (AKS preferred).
- Implement scaling, rollouts, secrets/config management, ingress, and reliability patterns.
- Set up monitoring/telemetry using Azure Monitor/App Insights (or equivalent), alerts, and runbooks.
Required Skills and Qualifications
Mandatory Certifications (Must)
- AI-102: Microsoft Certified Azure AI Engineer Associate
- DP-100: Microsoft Certified Azure Data Scientist Associate
Core Technical Skills
- Agents/Frameworks: Strong hands-on experience with LangChain, LangGraph, and Microsoft Agent Framework.
- LLMOps: Strong experience with Langfuse for tracing/evaluation/monitoring (or equivalent tooling, with Langfuse preferred).
- Azure: Azure ML, Azure AI Foundry, Azure AI Search; plus Key Vault, Storage, App Insights/Monitor as needed.
- Programming: Strong Python; API development with FastAPI; Node.js for services/integrations.
- Frontend: React for production UI development.
- ML/DL/CV: Proven hands-on depth in ML/DL and Computer Vision.
- Deployment: Docker + Kubernetes/AKS.
- Data: Strong SQL; experience with structured + unstructured data.
Proven Experience (Non-Negotiable)
- Demonstrated end-to-end delivery of AI applications in production (build deploy operate), with measurable impact.
Preferred Qualifications
- Experience in real estate / construction domain AI use cases (valuation, forecasting, risk, customer support automation).
- Exposure to graph databases (e.g., Neo4j) and vector search/vector databases for AI applications.
- Extra certifications (nice-to-have): Azure Fundamentals (AZ-900), Azure Developer (AZ-204), Kubernetes (CKA/CKAD), Databricks ML.
What Success Looks Like (Outcomes)
- Delivered production-grade AI solutions end-to-end: data model agentic workflow API UI AKS deployment monitoring.
- Established strong LLMOps with Langfuse: traceability, evaluation, cost controls, and reliability improvements.
- Built reliable, secure, observable systems with measurable business impact (time saved, accuracy gains, automation rate, cost reduction).
- Demonstrated strong ownership from POC to production and post-launch iteration.