THE ROLE
As a Lead Disaster Recovery Engineerument, you will play a critical role in ensuring our organization's resilience against unforeseen disruptions. Join us at the Mountain where culture and values are practiced and respected every day.
PRIMARY RESPONSIBILITIES
- Plan, schedule, and manage all phases of DR testing, including tabletop exercises, failover drills, and full-scale disaster simulations.
- Drive accountability and ownership with application and infrastructure teams to ensure all DR planning and execution tasks are completed, addressing instances where tests are waived due to a lack of planning.
- Support application teams in the development, maintenance, and updating of comprehensive DR plans, ensuring alignment with business requirements and industry best practices.
- Collaborate with the Business Impact Analysis (BIA) Team to conduct regular BIA and CRAM rating assessments.
- Act as a coordinator for the preparation of DR documents and tests.
- Provide recommendations for DR plan improvements and recovery strategies based on risk requirements.
- Experience with automating DR testing processes and tools.
ADDITIONAL RESPONSIBILITIES
- Conduct DR test kick-off meetings and set expectations in line with the DR process goals.
- Perform quality reviews of DR plan and test result documents.
- Coordinate DR tests according to the DR test calendar.
- Ensure DR plans and results are documented and approved for annual audit reviews. The test results summary and output must be in a format that passes external audits and satisfies clients and customers regarding system resiliency.
- Ensure that vendor-managed DR tests adhere to the organization's processes.
- Consult with assigned application support teams to conduct BIA assessments, identify, and resolve gaps.
- Maintain accurate records of DR plans, test results, and related documentation.
- Prepare and present reports on DR testing activities, findings, and management recommendations.
- Liaise with IT staff, business operations, end-users, project teams, infosec, and TRP/ARB teams to ensure DR readiness.
- Identify and implement improvements to the DR testing process and plans.
- Drive automation of functional DR tests and setup.
QUALIFICATIONS
- Proven experience in managing disaster recovery testing programs and developing DR plans.
- Strong understanding of IT infrastructure, systems, and disaster recovery techniques, methods, and technologies.
- Excellent project management skills with the ability to manage multiple parallel projects and priorities.
- Strong communication and interpersonal skills, with the ability to effectively interact with a variety of stakeholders.
- Strong analytical and problem-solving skills.
- Demonstrated ability to drive accountability and ownership within application and infrastructure teams.
- Experience in creating test results summaries and outputs suitable for external audits and demonstrating system resiliency to clients.
NICE TO HAVE
- Relevant certifications such as CBCP (Certified Business Continuity Professional) or similar.
- Knowledge of the ITIL framework and IT service management best practices.
- Exposure to and understanding of how to leverage AI/ML to enhance DR testing efficiency, predictability, and reporting
EDUCATION
- Bachelor's degree in Computer Science, Information Technology, or equivalent.
- Relevant certifications in Disaster Recovery or Business Continuity.
Category: Information Technology