Role Purpose
The Linguistic Data Analyst is responsible for collecting, analyzing, organizing, and cleaning multilingual conversational data, with a strong focus on diplomatic and formal terminology, to prepare high-quality datasets for training Speech-to-Text (STT) AI models.
This role is critical to ensuring linguistic accuracy, terminology consistency, and data readiness for AI model development, particularly in government, diplomatic, and formal communication domains.
Key Responsibilities
Linguistic Data Collection:
- Collect and curate conversational audio and text data (meetings, interviews, speeches)
- Work with multilingual datasets, primarily Arabic and English
- Ensure compliance with privacy and data governance standards
Data Cleaning & Structuring:
- Clean datasets by removing noise, duplication, and inconsistencies
- Normalize formal and semi-formal language usage
- Organize data by speaker, context, and formality
Linguistic & Terminology Analysis:
- Extract and standardize diplomatic and official terminology
- Build and maintain a diplomatic glossary
AI Training Data Preparation:
- Prepare AI-ready datasets with timestamps and metadata
- Support annotation teams with linguistic guidelines
Collaboration & Documentation:
- Work with AI Engineers, Data Scientists and PMs
- Document standards and methodologies
Requirements
Required Qualifications
Education:
- Bachelor's degree in Linguistics, Translation, Arabic/English Studies, or related field
Core Skills:
- Strong linguistic analysis skills
- Experience with conversational or textual datasets
- High attention to detail
Technical Skills (Preferred):
- Familiarity with STT and NLP concepts
- Experience with data annotation workflows
Languages:
- Arabic: Fluent (mandatory)
- English: Fluent (mandatory)
- Additional languages are a plus