We're looking for a talented Site Reliability Engineer (SRE) to keep our systems running smoothly, reliably, and at scale. Through smart automation, deep observability, and a calm head in a crisis, you'll help us balance speed, compliance, and stability, working alongside DevOps, Cloud, Quality Engineering, and Product teams to drive continuous improvements in performance, security, and resilience.
Experience And Qualifications
- 5+ years of experience in SRE or DevOps roles, building and managing large-scale, high-availability systems across banking, fintech, e-commerce, or other data-intensive digital ecosystems.
- Bachelor's degree in Computer Science or equivalent technical experience.
- Strong experience with Linux environments and performance troubleshooting.
- Proven expertise in Terraform and Infrastructure as Code (IaC) methodologies.
- Proficiency with Kubernetes and container orchestration in microservices environments.
- Hands-on experience with AWS (preferred); exposure to Azure or GCP is an advantage.
- Deep knowledge of Dynatrace (AIOps, Davis AI), Prometheus, Grafana, and the ELK stack.
- Experience implementing AI / ML-driven reliability or automation solutions (AIOps, anomaly detection, predictive alerting).
- Practical understanding of CI / CD pipelines (GitHub Actions, Jenkins, GitLab CI / CD or Azure DevOps).
- Experience with Kafka, RabbitMQ, Redis, Aurora, and RDS databases.
- Strong scripting or programming skills in Python, Bash, or Go.