Search by job, company or skills

takamol holding

Site Reliability Engineering Officer

This job is no longer accepting applications

new job description bg glownew job description bg glow
  • Posted 26 days ago

Job Description

Job Description

Job description :

  • Provide support for application incidents across digital platforms, working closely with Platform Engineering, Application Development, and customer support teams to ensure timely resolution according to established SLAs and escalation procedures.
  • Operate and monitor the Elastic Observability stack — including Elasticsearch cluster health, Kibana, Fleet Server, APM Server, and Elastic Agent — deployed and managed via ECK on OKE.
  • Assist with day-to-day Elasticsearch operations such as index lifecycle management (ILM), snapshot lifecycle management (SLM), data tier housekeeping (hot, warm, cold, frozen), and capacity monitoring.
  • Troubleshoot telemetry ingestion issues across logs, metrics, traces, and synthetic monitors, ensuring consistent data collection from all platforms.
  • Maintain and update Kibana dashboards, alerting rules, and saved objects under the guidance of the SRE Manager.
  • Perform root cause analysis and participate in blameless post-incident reviews to improve system reliability and reduce recurrence.
  • Collaborate with Platform Engineering to automate repetitive tasks, improve deployment pipelines, and enhance observability coverage using Terraform, Helm charts, and scripting.
  • Develop and maintain support documentation, runbooks, and knowledge base articles aligned to standardized incident response procedures.
  • Manage and prioritize incidents and requests via the ticketing system (Jira/ServiceNow), ensuring all incidents, requests, and resolutions are documented in the service management system.
  • Participate in an on-call rotation and help reduce operational toil through automation and tooling.
  • Monitor and report on key performance metrics related to incident management, including mean time to detect (MTTD) and mean time to resolve (MTTR).
  • Collaborate with cross-functional teams and vendor partners to improve overall system reliability, observability maturity, and security posture.

Job Requirements

  • Bachelor's degree in Computer Science, IT, Engineering, or related field (or equivalent experience).
  • 1–3 years of experience in IT operations, system administration, application support, DevOps, or SRE.
  • Familiarity with Observbility tools such as Elastic Stack (Elasticsearch, Kibana, etc.), including basic querying and dashboard usage.
  • Knowledge of Linux systems and scripting (Bash, Python, or Go).
  • Understanding of monitoring, logging, and alerting concepts.
  • Experience with ITSM tools (ServiceNow, Jira, Zendesk) and ITIL practices.
  • Strong grasp of incident, problem, and change management.
  • Basic experience with cloud native enviroments and containers such as Docker and Kubernetes.
  • Strong critical thinking, troubleshooting, and communication skills.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 146714019