We are looking for an MLOps / Platform Engineer . In this hands-on role, you'll design and implement secure, scalable deployment pipelines in on-premises and private cloud environments, directly supporting machine learning workloads and GenAI solutions.
Key Responsibilities:
- Operate and manage Kubernetes clusters in production, designing deployments for maximum efficiency and reliability.
- Develop and maintain CI/CD pipelines using tools like Jenkins, ensuring seamless automation from build to deployment.
- Implement observability practices, including logging and metrics, to quickly identify and resolve issues.
- Deploy ML models and services, ensuring high performance and scalability while optimizing inference processes.
- Work closely with AI engineers to transition prototypes into production-grade deployments and maintain thorough documentation.
Requirements:
- 38 years in Platform Engineering, DevOps, or Site Reliability Engineering (SRE).
- At least 2 years of experience with Kubernetes in a production environment.
- Proficient in Docker, CI/CD practices, and Linux.
- Knowledge of GitOps tools (e.g., ArgoCD, Flux) and Infrastructure as Code (Terraform/Ansible).
- Experience deploying machine learning workloads and familiarity with GenAI technologies.
- Strong ownership mindset with excellent documentation and collaboration skills.
- Ability to work directly with clients and navigate complex environments.