Location: Remote
Department: Engineering / Infrastructure
Reports to: CTO
About the Role:
We are seeking a highly skilled
Senior DevOps Engineer to design, build, and maintain our cloud infrastructure across
AWS and
Azure. You will take ownership of our
Kubernetes-based deployments (EKS & AKS), ensure system reliability, and drive automation and observability across environments.
The ideal candidate is a problem-solver with deep technical knowledge, strong scripting skills, and a passion for building scalable, secure, and efficient systems. You will collaborate closely with engineering, QA, and product teams to improve deployment workflows and support continuous delivery at scale.
Key Responsibilities:
- Design, deploy, and maintain Kubernetes clusters (EKS, AKS) and cloud-native applications.
- Implement and manage Vault for secrets management and automated configuration workflows.
- Develop and maintain CI/CD pipelines using Jenkins, ArgoCD, and GitHub Actions.
- Implement Infrastructure as Code (IaC) using Terraform or CloudFormation, with policy-as-code enforcement (OPA, Terraform Cloud).
- Establish and maintain monitoring, logging, and observability stacks (Prometheus, Grafana, Loki, CloudWatch, OpenTelemetry).
- Set and track SLOs/SLIs, manage error budgets, and drive a culture of reliability (SRE).
- Manage runtime environments for PHP-FPM, Go, and Python applications.
- Configure and optimize databases (MySQL, NoSQL) and caching (Redis) for performance and reliability.
- Maintain search and queueing systems (Elasticsearch, RabbitMQ, Amazon SQS).
- Ensure security and compliance by applying DevSecOps best practices (IAM, VPC, vulnerability scanning, SOC2/ISO27001 alignment).
- Respond to production incidents quickly, lead root cause analysis (RCA), and implement preventive measures.
- Develop and test Disaster Recovery (DR) and business continuity plans.
- Collaborate across teams to optimize cost, performance, and delivery pipelines.
- Bachelor's degree in Computer Science, Engineering, or equivalent experience.
- 5+ years of hands-on DevOps, SRE, or Cloud Engineering experience.
- Expertise in Kubernetes, EKS, AKS, and container orchestration.
- Strong knowledge of AWS and Azure architecture and services.
- Experience with Vault for secrets and configuration management.
- Strong proficiency in Bash, Python, Go, or PHP scripting.
- Proven experience with CI/CD tools (Jenkins, ArgoCD, GitHub Actions).
- Advanced skills in monitoring, alerting, and observability (Grafana, Prometheus, Datadog, CloudWatch).
- Familiarity with Helm, service mesh (Istio, Linkerd), and cloud-native logging (Fluentd, Loki).
- Proficiency with MySQL, NoSQL, Redis, Elasticsearch, RabbitMQ, SQS.
- Deep understanding of networking (DNS, routing, VPN, firewalls, load balancing).
- Strong incident management and troubleshooting skills.
- Experience implementing DR, backup, and high-availability strategies.
- Solid understanding of DevSecOps, IAM, and security compliance frameworks.
- Experience with serverless architectures (AWS Lambda, Azure Functions).
- Knowledge of cost optimization and performance tuning across multi-cloud environments.
- Experience with observability tracing tools (Jaeger, Zipkin).
Soft Skills:
- Excellent problem-solving and analytical skills.
- Strong communication and teamwork abilities across distributed teams.
- Ability to work under pressure and make decisions quickly during incidents.
- A passion for automation, scalability, and continuous improvement.
- Mentorship mindset and willingness to share knowledge.