Lead AI Platform

integrant, inc.

Egypt, Cairo

8-10 Years

Save

Posted 2 days ago
Be among the first 10 applicants

Early Applicant

Job Description

Integrant is looking for game changers to join our team as Lead AI Platform.

The Lead AI Platform Engineer is responsible for bridging AI workloads with production-grade infrastructure, with a strong focus on NVIDIA AI stack, enabling high-performance, scalable, and optimized AI systems.

This role focuses on model optimization, runtime efficiency, and GPU utilization, ensuring that AI workloads are production-ready, cost-efficient, and performant across enterprise environments.

Roles and Responsibilities:

Translate AI/ML workloads into optimized infrastructure and deployment strategies
Optimize model performance across GPU environments (latency, throughput, memory utilization)
Design and implement inference and training pipelines using NVIDIA stack tools (TensorRT, Triton, NIM)
Convert and optimize models across frameworks (PyTorch ONNX TensorRT)
Analyze and resolve performance bottlenecks using profiling tools (GPU, memory, network)
Improve GPU utilization and scheduling efficiency across clusters
Design scalable distributed training and inference architectures
Work closely with customers to define AI infrastructure strategies and deployment models
Support production deployments including monitoring, rollback, and performance validation
Conduct applied research to improve model efficiency and infrastructure utilization
Mentor team members on AI infrastructure, optimization, and GPU systems
Experiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparison
Find the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shifts
Root cause analysis (RCA) applies to ML systems: isolating variables, reproducing issues

Requirements

8+ years of experience in AI systems
8+ years of experience in ML systems, HPC and AI infrastructure
Strong proficiency in Python
Strong experience with GPU-based AI workloads and performance optimization
Deep understanding of model optimization techniques (quantization, pruning, batching)
Hands-on experience with:
PyTorch
ONNX / ONNX Runtime
TensorRT / TensorRT-LLM
Triton Inference Server
Knowledge of CUDA, cuDNN, and GPU architecture fundamentals
Experience with distributed systems (multi-GPU / multi-node)
Familiarity with:
NCCL communication
NVLink / InfiniBand
Kubernetes or Slurm for orchestration
Experience deploying AI models into production environments
Ability to analyze system bottlenecks (compute, memory, network)
Experience with profiling tools (Nsight, TensorRT profiler, etc.)
Knowledge of cost optimization strategies for GPU workloads
Experiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparison
Find the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shifts
Root cause analysis (RCA) applies to ML systems: isolating variables, reproducing issues