Consultant (HPC)

Norconsult Telematics

United Arab Emirates, Dubai

7-9 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

The job is located in RIYADH, SAUDI ARABIA

Position Objective

The HPC Expert will architect, deploy, and optimise large-scale, performance-critical HPC environments supporting tightly coupled MPI workloads, GPU-accelerated applications, and data-intensive simulations. This role is focused on extracting maximum performance from compute, network, and storage layers through low-level tuning, benchmarking, and continuous optimisation

Job Description and Responsibilities.

Design, Architect, and operate HPC clusters of 100500+ nodes, including CPU-only and GPU-accelerated systems optimised for tightly coupled parallel workload
Design and tune MPI-based environments (OpenMPI, Intel MPI, MPICH), including process pinning, CPU affinity, memory binding, and topology-aware scheduling.
Optimise NUMA architectures, huge pages, CPU isolation, BIOS/firmware settings, and kernel parameters for latency- and throughput-sensitive workloads.
Deploy and tune high-speed interconnects (InfiniBand / RDMA / RoCE), including fabric configuration, QoS, congestion control, and performance validation.
Configure and operate GPU-accelerated HPC systems, including CUDA-aware MPI, NCCL, GPUDirect RDMA, and multi-GPU/NVLink topologies.
Manage and tune job schedulers (Slurm, PBS, LSF) with advanced configurations such as topology-aware scheduling, GPU binding, fair-share policies, and preemption.
Design and optimise parallel file systems (Lustre, GPFS, BeeGFS), including metadata tuning, stripe configuration, and I/O performance optimisation.
Develop and maintain automation frameworks (Ansible, Bash, Python) for bare-metal provisioning, cluster expansion, and repeatable performance configurations.
Perform HPC benchmarking and performance analysis using tools such as HPL, IOzone, IOR, FIO, OSU Micro-Benchmarks, and application-level profiling.
Partner with researchers and engineering teams to profile, debug, and tune applications, improving scalability, efficiency, and time-to-solution.
Implement system-level security, reliability, and compliance controls without compromising performance.
Lead capacity planning, scalability assessments, and next-generation HPC architecture evaluations.

Qualifications & Skills

Bachelor's degree in Computer Science, Engineering, or a related technical field.
7+ years of deep, hands-on HPC experience in performance-critical environments (academic labs, national labs, research centres, or enterprise HPC).
Expert-level knowledge of MPI programming environments and performance tuning for large-scale parallel jobs.
Strong understanding of NUMA, CPU pinning, memory locality, cache behaviour, and kernel-level tuning.
Hands-on experience with RDMA-capable networks (InfiniBand / RoCE), including fabric monitoring and troubleshooting.
Proven experience with GPU-enabled HPC clusters, CUDA, CUDA-aware MPI, and GPUDirect technologies.
Advanced experience managing Slurm / PBS / LSF in large, heterogeneous clusters.
Deep expertise in parallel storage performance tuning (Lustre, GPFS, BeeGFS).
Strong Linux internals knowledge on Red Hat Enterprise Linux.
RHCE or equivalent Linux certification preferred.
Experience with benchmark-driven design decisions and performance regression analysis.
Ability to communicate low-level performance issues to both technical and non-technical stakeholders.