IBM Resiliency Orchestration L3 Engineer
Primary Roles and Responsibilities (R&R)
- Design & Implementation of Recovery Workflows:Developing and maintaining automated failover and failback processes using over 450 pre-packaged patterns for enterprise applications.
- Configuring Cyber Incident Recovery (CIR) features, such as immutable WORM storage and air-gapped mechanisms, to protect against malware and ransomware.
- Advanced Troubleshooting & L3 Support:Serving as the final point of escalation for complex issues within the IBM Resiliency Orchestration software suite.
- Performing root-cause analysis (RCA) on replication failures or deviations from Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
- Automation & Scripting:Developing custom scripts (e.g., Python, Bash, or Ansible) to automate repetitive IT operations and reduce operational toil.
- Integrating IBM Resiliency Orchestration with other enterprise security platforms like QRadar SOAR or SIEM , SPLUNK for proactive threat response.
- Readiness Validation & Compliance:Executing automated, non-disruptive DR drills and dry runs to detect environment changes that might cause recovery failures.
- Generating real-time compliance and audit reports via the centralized dashboard to ensure regulatory requirements are met.
- Strategic Advisory:Collaborating with SRE and DevOps teams to align IBM Concert or RO tools with the organization's broader resilience strategy.
- Advising on hybrid multicloud infrastructure risks and proactively mitigating potential outage points.
Core Technical Requirements
- Expertise in IBM Infrastructure and hybrid cloud platforms.
- Proficiency in scripting (Python, Ansible, Terraform) and network protocols.
- Deep understanding of Copy Data Management and immutable storage technologies.