Integrant is looking for game changers to join our team as Lead AI Platform.
The Lead AI Platform Engineer is responsible for bridging AI workloads with production-grade infrastructure, with a strong focus on NVIDIA AI stack, enabling high-performance, scalable, and optimized AI systems.
This role focuses on model optimization, runtime efficiency, and GPU utilization, ensuring that AI workloads are production-ready, cost-efficient, and performant across enterprise environments.
Roles and Responsibilities:
- Translate AI/ML workloads into optimized infrastructure and deployment strategies
- Optimize model performance across GPU environments (latency, throughput, memory utilization)
- Design and implement inference and training pipelines using NVIDIA stack tools (TensorRT, Triton, NIM)
- Convert and optimize models across frameworks (PyTorch ONNX TensorRT)
- Analyze and resolve performance bottlenecks using profiling tools (GPU, memory, network)
- Improve GPU utilization and scheduling efficiency across clusters
- Design scalable distributed training and inference architectures
- Work closely with customers to define AI infrastructure strategies and deployment models
- Support production deployments including monitoring, rollback, and performance validation
- Conduct applied research to improve model efficiency and infrastructure utilization
- Mentor team members on AI infrastructure, optimization, and GPU systems
- Experiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparison
- Find the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shifts
- Root cause analysis (RCA) applies to ML systems: isolating variables, reproducing issues
Requirements
- 8+ years of experience in AI systems
- 8+ years of experience in ML systems, HPC and AI infrastructure
- Strong proficiency in Python
- Strong experience with GPU-based AI workloads and performance optimization
- Deep understanding of model optimization techniques (quantization, pruning, batching)
- Hands-on experience with:
- PyTorch
- ONNX / ONNX Runtime
- TensorRT / TensorRT-LLM
- Triton Inference Server
- Knowledge of CUDA, cuDNN, and GPU architecture fundamentals
- Experience with distributed systems (multi-GPU / multi-node)
- Familiarity with:
- NCCL communication
- NVLink / InfiniBand
- Kubernetes or Slurm for orchestration
- Experience deploying AI models into production environments
- Ability to analyze system bottlenecks (compute, memory, network)
- Experience with profiling tools (Nsight, TensorRT profiler, etc.)
- Knowledge of cost optimization strategies for GPU workloads
- Experiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparison
- Find the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shifts
- Root cause analysis (RCA) applies to ML systems: isolating variables, reproducing issues
Nice to Have
- Experience with NVIDIA NIM and NGC ecosystem
- Exposure to Megatron-LM, NeMo, or large-scale LLM training/inference
- Experience with LLM optimization techniques (KV cache, batching strategies)
- Familiarity with MLOps practices and CI/CD for AI systems
- Experience in customer-facing architecture or consulting roles
- Familiarity with hybrid cloud / on-prem HPC environments
Benefits
- Salary paid in USD
- Six-month career advancing opportunities
- Supportive and friendly work environment
- Premium medical insurance [employee +family]
- English language development courses
- Interest-free loans paid over 2.5 years
- Technical development courses
- Planned overtime program (POP)
- Employment referral program
- Premium location in Maadi
- Social insurance