Computer Vision Engineer

Summit Technology Solutions

Egypt, Cairo

2-5 Years

Save

Posted 21 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Key Responsibilities

Fine-tune pre-trained OCR models (e.g., DeepSeek-OCR, Dots-OCR, EasyOCR) on Arabic handwritten datasets.
Collect, annotate, and preprocess Arabic handwritten data including text, forms, and documents.
Design and implement data augmentation pipelines tailored for Arabic script variability (diacritics, connected letters, cursive styles).
Evaluate model performance using CER (Character Error Rate) and WER (Word Error Rate) metrics.
Integrate trained OCR models into backend APIs or document processing pipelines.
Continuously improve model accuracy through iterative training, hyperparameter tuning, and error analysis.
Develop real-time hand gestures and sign language recognition models using video and image data.
Implement pose estimation techniques (e.g., MediaPipe, OpenPose) to extract hand and body keypoints.
Build and train classification models (CNNs, LSTMs, Transformers) for static and dynamic sign recognition.
Work with domain experts and the deaf/hard-of-hearing community to gather and validate sign language datasets.
Optimize models for real-time performance on edge devices or web/mobile applications.
Document model architecture, training procedures, and deployment steps for the engineering team.
General Engineering Responsibilities
Collaborate with backend engineers to deploy models using Fast API, Flask, or similar frameworks.
Write clean, well-documented, and testable Python code following best engineering practices.
Participate in code reviews, sprint planning, and technical architecture discussions.
Stay up to date with the latest research in OCR, NLP for Arabic text, and gesture recognition.
Contribute to internal tooling for data labeling, model monitoring, and performance dashboards.

Required Qualifications

Bachelor's or master's degree in computer science, Electrical Engineering, AI, or a related field.
2–3 years of hands-on experience in computer vision or machine learning engineering.
Strong proficiency in Python; experience with PyTorch or TensorFlow for model development.
Solid understanding of deep learning architectures: CNNs, RNNs/LSTMs, Transformers, and Attention mechanisms, QLoRa.
Experience with image preprocessing, augmentation libraries (Augmentations, OpenCV, PIL).
Familiarity with OCR pipelines and text recognition challenges.
Experience with fine-tuning models on custom datasets and transfer learning techniques.
Ability to work with version control (Git) and experiment tracking tools (MLflow, Weights & Biases).
Preferred Qualifications
Prior experience working with Arabic NLP or Arabic OCR datasets is a strong plus.
Experience with sign language datasets or human pose/gesture estimation frameworks (MediaPipe, OpenPose).
Familiarity with Hugging Face Transformers and pre-trained vision-language models (DeepSeek-OCR, Dots-Ocr, EasyOCR).
Knowledge of model quantization, pruning, and ONNX/TensorRT deployment for inference optimization.
Experience building REST APIs for model serving (FastAPI, Flask).
Exposure to annotation tools such as Label Studio, CVAT, or Roboflow.
Published research, open-source contributions, or Kaggle competition experience.