Title: Senior Software Engineer - High-Scale AI CXM Platform
Tech Stack: Ruby on Rails · Python · PostgreSQL · Elasticsearch · Redis · RabbitMQ · Kubernetes · AWS / GCP · APISIX · Grafana · Loki
What You'll Do:
You'll be building and evolving a large-scale, event-driven backend platform that processes massive volumes of customer experience data in real time across multiple digital channels.
The environment is highly distributed, with 100+ microservices handling billions of events, where reliability, throughput, and fault tolerance are critical.
You'll work across:
- Designing and operating high-volume, event-based systems that reliably move and process data at scale
- Building and scaling messaging infrastructure (RabbitMQ), including queue stability, consumer scaling strategies, and handling backpressure in production
- Developing API gateway capabilities such as intelligent routing, traffic control, multi-environment separation, and upstream management
- Supporting enterprise authentication flows, including multi-provider identity federation and SSO integration without tight coupling to core services
- Defining and maintaining service boundaries across ingestion, processing, and delivery layers spanning Ruby and Python systems
- Investigating and resolving deep production issues such as deadlocks, queue saturation, and database contention — and removing the underlying systemic causes rather than patching symptoms
- Optimising PostgreSQL under heavy write load, including schema design, indexing strategy, connection scaling, and contention reduction
- Designing and tuning Elasticsearch for large-scale search workloads, including real-time, multilingual (including Arabic) relevance challenges
- Choosing the right execution model (async vs multi-process vs hybrid) depending on workload behaviour and system constraints
Who They Are:
A fast-scaling AI-driven customer experience intelligence platform operating across the MENA region.
The business is moving towards IPO-level scale and is actively re-architecting its core systems to support extreme growth, real-time intelligence, and AI-native capabilities.
The engineering culture is built around ownership at every level — engineers are expected to engage directly with system failures, production instability, and architectural weaknesses without waiting for escalation paths or strict ownership boundaries.
This is an environment where the system is treated as shared responsibility. If something is broken or inefficient, the expectation is that the person who sees it will step in and improve it.
The platform itself operates at very large scale, processing billions of customer interaction signals and evolving toward real-time AI-driven decisioning across enterprise clients.
What Is In It For You:
- Exposure to systems operating at genuine high scale (billions of events, not synthetic workloads)
- Direct visibility and impact with senior leadership, including CTO-level engagement
- Opportunity to work in a pre-IPO environment where engineering decisions directly shape product and company trajectory
- High autonomy with real ownership over critical systems
- The chance to work on complex distributed systems problems that sit at the edge of performance, reliability, and AI integration
Requirements:
- Strong grounding in distributed systems and an instinct for how systems fail under load
- Proven experience working with event-driven architectures and message-based systems in production environments
- Deep understanding of concurrency, system bottlenecks, and resilience patterns
- Hands-on experience investigating and resolving complex production incidents, with a focus on root cause elimination rather than surface fixes
- Background in backend engineering using Ruby on Rails or Python at scale
- Comfortable working across large, evolving codebases and understanding system behaviour quickly
- Strong opinions on architecture backed by real production experience and measurable outcomes
- Systems thinker: you naturally reason in terms of latency, throughput, failure modes, and cost efficiency
- Treats observability, testing, and system documentation as core engineering responsibilities rather than optional extras
- Demonstrated ability to ship quickly without compromising system stability or reliability
- Track record of proactively improving systems beyond your immediate scope of responsibility
- Experience stabilising or rebuilding significant production systems and able to clearly explain the technical challenges involved