We are seeking a Senior DevOps with deep hands-on expertise in Rancher Kubernetes on bare metal, Kubernetes Operators, and high-availability design across core data services (PostgreSQL, Kafka, Redis, Vault, Keycloak). The ideal consultant has production experience integrating hardware load balancers (preferably Fortinet, alternatively F5) with Kubernetes ingress, and can deliver a repeatable, scripted cluster bootstrap and observability stack (Grafana, Prometheus, Jaeger, plus EFK or VictoriaMetrics Logs).
You will work onsite in Dubai with the Tejori team to design, implement, document, and hand over a robust, secure, and monitored platform to run Emcode's commercial workloads.
Key Responsibilities
Kubernetes & Rancher
- Architect and deploy a highly available Rancher-managed Kubernetes cluster on bare metal (multi-master, etcd quorum sizing, worker pools).
- Implement cluster provisioning and lifecycle automation (bootstrapping scripts and/or Pulumi/Ansible based flows).
- Configure Rancher projects, namespaces, RBAC, and fleet for multi-environment governance.
Networking & Ingress
- Design and implement ingress traffic from Fortinet hardware load balancers (or F5) to the Rancher/Kubernetes ingress layer (Layer 4/7).
- Configure ingress controllers (e.g., NGINX or HAProxy), TLS termination, mTLS where applicable, and WAF/security policies at the LB and cluster edge.
Data & Platform Services on Kubernetes
- Deploy and harden PostgreSQL, Kafka, Redis, Vault, and Keycloak using Operators and/or well-supported Helm charts.
- Configure backup/restore, secrets management (Vault/KMS), rotations, and HA (replication, quorum, partitions, failover).
- Ensure data durability and performance tuning for bare-metal constraints (storage classes, CSI drivers, network tuning).
Observability & Logging
- Stand up Prometheus + Alertmanager + Grafana for metrics and dashboards.
- Deploy Jaeger (or OpenTelemetry Collector Jaeger) for distributed tracing.
- Implement EFK (Elasticsearch/Fluentd/Kibana) or VictoriaMetrics Logs (e.g., VictoriaLogs/Loki alternative) with retention and index strategy.
Automation & IaC
- Create idempotent scripts and/or Pulumi stacks for cluster bootstrap, app provisioning, and infra config.
- Develop Ansible roles/playbooks for OS hardening, package prep, and repeatable node bringup.
Security & Compliance
- Enforce network policies, RBAC/ABAC, PodSecurity/PSA, image signing/scanning, registry policies.
- Integrate Keycloak for SSO into Rancher, Grafana, and app workloads.
- Establish backup, DR, and secrets management standards (Vault policies, transit encryption).
Documentation & Handover
- Produce as-built documentation, runbooks, troubleshooting guides, and DR procedures.
- Conduct knowledge transfer sessions and tabletop failover tests with the team.
Required Qualifications
- 10+ years DevOps/SRE with production Kubernetes (Rancher experience required).
- Strong with Kubernetes Operators, Helm, ingress controllers.
- Proven deployments of PostgreSQL, Kafka, Redis, Vault, Keycloak on K8s.
- Fortinet load balancers (highly desirable) or F5 experience in production.
- Pulumi (preferred) and Ansible for IaC and configuration.
- Monitoring stack: Grafana, Prometheus, Alertmanager, Jaeger; EFK or Victoria Logs.
- Bash proficiency and at least one high-level language (Go or Python).
- HA/DR design on bare metal, including storage/CNI selection and tuning.
- Excellent documentation and team enablement skills.