Search by job, company or skills

Summit Technology Solutions

Computer Vision Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 21 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Key Responsibilities

  • Fine-tune pre-trained OCR models (e.g., DeepSeek-OCR, Dots-OCR, EasyOCR) on Arabic handwritten datasets.
  • Collect, annotate, and preprocess Arabic handwritten data including text, forms, and documents.
  • Design and implement data augmentation pipelines tailored for Arabic script variability (diacritics, connected letters, cursive styles).
  • Evaluate model performance using CER (Character Error Rate) and WER (Word Error Rate) metrics.
  • Integrate trained OCR models into backend APIs or document processing pipelines.
  • Continuously improve model accuracy through iterative training, hyperparameter tuning, and error analysis.
  • Develop real-time hand gestures and sign language recognition models using video and image data.
  • Implement pose estimation techniques (e.g., MediaPipe, OpenPose) to extract hand and body keypoints.
  • Build and train classification models (CNNs, LSTMs, Transformers) for static and dynamic sign recognition.
  • Work with domain experts and the deaf/hard-of-hearing community to gather and validate sign language datasets.
  • Optimize models for real-time performance on edge devices or web/mobile applications.
  • Document model architecture, training procedures, and deployment steps for the engineering team.
  • General Engineering Responsibilities
  • Collaborate with backend engineers to deploy models using Fast API, Flask, or similar frameworks.
  • Write clean, well-documented, and testable Python code following best engineering practices.
  • Participate in code reviews, sprint planning, and technical architecture discussions.
  • Stay up to date with the latest research in OCR, NLP for Arabic text, and gesture recognition.
  • Contribute to internal tooling for data labeling, model monitoring, and performance dashboards.

Required Qualifications

  • Bachelor's or master's degree in computer science, Electrical Engineering, AI, or a related field.
  • 2–3 years of hands-on experience in computer vision or machine learning engineering.
  • Strong proficiency in Python; experience with PyTorch or TensorFlow for model development.
  • Solid understanding of deep learning architectures: CNNs, RNNs/LSTMs, Transformers, and Attention mechanisms, QLoRa.
  • Experience with image preprocessing, augmentation libraries (Augmentations, OpenCV, PIL).
  • Familiarity with OCR pipelines and text recognition challenges.
  • Experience with fine-tuning models on custom datasets and transfer learning techniques.
  • Ability to work with version control (Git) and experiment tracking tools (MLflow, Weights & Biases).
  • Preferred Qualifications
  • Prior experience working with Arabic NLP or Arabic OCR datasets is a strong plus.
  • Experience with sign language datasets or human pose/gesture estimation frameworks (MediaPipe, OpenPose).
  • Familiarity with Hugging Face Transformers and pre-trained vision-language models (DeepSeek-OCR, Dots-Ocr, EasyOCR).
  • Knowledge of model quantization, pruning, and ONNX/TensorRT deployment for inference optimization.
  • Experience building REST APIs for model serving (FastAPI, Flask).
  • Exposure to annotation tools such as Label Studio, CVAT, or Roboflow.
  • Published research, open-source contributions, or Kaggle competition experience.

More Info

Job ID: 146063747

Similar Jobs