Sr. Site Reliability Engineer (SRE)

SiFi

Saudi Arabia, Riyadh

Fresher

Save

Posted 3 months ago
Be among the first 10 applicants

Early Applicant

Job Description

This is a remote position.

About SiFi: SiFi is a rapidly growing B2B Fin-Tech company transforming expense management for businesses in Saudi Arabia. As a licensed EMI from the Saudi Central Bank, we empower companies with innovative tools to simplify finance management.

Position Overview

We are looking for a Senior Site Reliability Engineer (SRE) who will take ownership of the reliability, performance, and scalability of our production systems. You will design, automate, and operate mission-critical environments that include Kubernetes clusters, database disaster recovery, workflow orchestration, and multi-region networking.

This role suits engineers who think deeply about systems combining infrastructure, automation, and diagnostic reasoning to drive operational excellence.

Primary Responsibilities

Reliability, Availability & Infrastructure

Maintain and evolve multi-region cloud infrastructure using Terraform-based Infrastructure as Code (IaC).
Operate and optimize Kubernetes (OKE) clusters running microservices, data pipelines, and workflow orchestration.
Manage SQL Server backup/restore pipelines, DR testing, and performance optimization.
Ensure high availability for .NET and Python applications hosted behind load balancers and WAF.
Design and maintain cross-network connectivity (DRGs, LPGs, VCNs, subnets, and NSGs).

Observability & Automation

Build and maintain a centralized orchestration platform integrated with alerting and notification systems.
Develop self-healing, monitoring, and auto-remediation scripts for infrastructure and databases.
Implement logging, metrics, and tracing pipelines
Automate recurring operational tasks using Python, Bash, and PowerShell to reduce manual effort and improve reliability.

DevOps, CI/CD & Security

Manage GitHub Actions and Octopus Deploy pipelines for backend and data services.
Apply strong security principles least privilege, network segmentation, secure credentials, and encrypted communications.
Promote GitOps and Infrastructure-as-Code practices to ensure repeatable and traceable deployments.
Collaborate with developers to embed reliability and resilience into every release

Collaboration & Incident Management

Lead incident response, run blameless post-mortems , and turn findings into lasting improvements.
Partner closely with engineering teams to drive design and code-level reliability improvements.

Conduct capacity planning, cost optimization, and system tuning for performance and scalability.
Mentor engineers in automation, observability, and root-cause analysis best practices

Troubleshooting Mindset & Diagnostic Thinking

We Value Engineers Who