As an AI Infrastructure & MLOps Engineer at Müller's Solutions for a 6-month contract, This role is
primarily operations-focused (90%), with hands-on involvement in
implementation, configuration, and setup of AI infrastructure and MLOps workflows.
You will play a key role in managing, operating, and guiding the deployment of a
strategic AI environment, working closely with the customer as a technical advisor and hands-on engineer.
What about the role responsibilities
- Operate and maintain AI infrastructure and MLOps platforms in a production environment.
- Monitor, manage, and troubleshoot Kubernetes-based AI workloads.
- Perform Acceptance Testing Planning and Execution for AI infrastructure and platforms.
- Ensure stability, performance, and availability of AI systems.
- Support day-to-day operational tasks across compute, storage, and networking layers
- Install and configure NVIDIA Enterprise AI Stack (NVAI).
- Configure and manage MLOps platforms such as Kubeflow and MLflow.
- Assist in setting up end-to-end AI workflows, including data pipelines.
- Support the initial implementation phase of the AI environment.
- Act as a technical guide and advisor to the customer during the early stages of their AI adoption
Requirements
What should you have to fit in this role
Technical Requirements
AI / MLOps Stack
- Proficient experience with the NVIDIA Enterprise AI Stack
- Familiarity with Ubuntu Linux
- Experience with Kubernetes
- Knowledge of Kubeflow / MLflow
- Experience with QFLOW (an open-source AI data pipeline management tool)
Programming & Automation
- 4-6 years of practical experience in:
- Python
- Jupyter Notebook / JupyterLab
- Competence in writing, testing, and maintaining operational scripts and AI workflows
Infrastructure Experience
Practical experience with enterprise infrastructure, encompassing:
- Dell PowerScale (5 nodes)
- XE Server (1 node)
- Dell R570 Servers (5 nodes)
- Dell Network Switches (2 switches)
- GPU-based AI servers (in a small-scale environment)
Environment Overview
- Initial implementation of AI
- Compact configuration:
- 1 GPU server
- 1 PowerScale
- 5 control plane servers
- Opportunity to shape best practices from the ground up
To succeed in this role, it's nice to have:
- Familiarity with data frameworks like Apache Spark or Hadoop for data processing
- Understanding of ML model monitoring and logging practices to ensure system reliability
- Experience with security best practices in AI systems