Planning how to overcome business interruptions can seem like an overwhelming or even impossible task. Our perceptions often are that disasters are big scary things . . . that happen to other people. Planning for them is a big, complicated job . . . that nobody has time for. Business continuity plans are huge binders full of microscopic details . . . that no one will ever use.
But the truth is far less daunting. Business Continuity Planning (BCP) is not rocket science and it does not require expensive, unfamiliar tools. You don’t need mountains of time and unlimited resources. Business Continuity Planning simply is the practice of asking, “What if…?” questions, then creating plans and preparing to address them.
Here are seven areas to keep in mind when approaching your Business Continuity Planning:
1. Get started
You can’t get anywhere until you take that first step.
Getting started involves a few basic components that are required to lay the foundation for a successful business continuity program. The first is developing an understanding of what it is you are planning for. Any number of events can cause a business interruption. As such, beginning the process by focusing on events can be too much to take in all at once. Instead, we focus on the impact, which is much easier to comprehend and plan for. There are three basic impact types we focus on and set baseline expectations for:
- Facility: Your primary place of business, which is either gone or you can’t get to it.
- Systems: These include the systems and infrastructure you rely on to support your critical business processes. During a business interruption, they are either unavailable or impaired.
- People: This is your team. For planning purposes, assume up to 50% of your team is unavailable for an extended period (8-12 weeks or more).
Once you know what you are planning for, you need to identify the planning team and operational areas that will need to be involved. Typically, this will include subject matter experts from each functional area of your credit union. The size of this team will vary depending on the size and complexity of your credit union. This may be two or three staff members at a small credit union and as many as thirty or more at larger credit unions.
You will also need to define the authority and oversight structure for the BCP program. Who will lead and be responsible for the effort? Who will have oversight of the program? At most credit unions, the CEO or a designated manager will lead the planning effort and the board of directors will fill the oversight role.
It is also important to identify and engage external resources that will be integral to the planning process. If you outsource your technology needs, the IT vendor(s) will play an important role. Ensuring you have adequate insurance in place and understanding how the claims process will work will also be key. Finally, becoming familiar with the BCP capabilities of third-party service providers will be a significant factor in your own planning efforts.
2. Complete a Business Impact Analysis (BIA)
Your business processes are the foundation of your Business Continuity Planning program. Before you can plan how to resume business processes following an interruption, you must first identify those processes and analyze the impacts that a disruption will have on them. This is done by completing a Business Impact Analysis (BIA).
The BIA process has several components. It begins by working with your planning team (those subject matter experts mentioned earlier) to identify and document your business processes. The most important thing to keep in mind when doing this is granularity. You are not writing a Standard Operating Procedures manual here, so that level of detail is not required. One of the assumptions we make in the planning process is that the people who will be implementing the plan will have a basic understanding of how the credit union functions and how processes are performed. Your already existing procedures manuals can provide more detailed backup as needed.
A good rule of thumb to follow is the “Five by Five” rule. In a nutshell, it posits that most functional areas should be able to describe the essence of what they do in about five business processes, and that each of those business processes should be able to be described in terms that the average person could comprehend in about five sentences or statements.
As an example, your finance/accounting team might identify their processes as Accounts Payable, Data Processing, General Ledger (GL) Reconciliation, Financial Reporting, and Asset Liability Management (ALM). While there may be many types of GLs that need reconciling each month, the process for each is similar enough to only need one process. If needed, you may include a listing of the various GL Accounts and the timing for each as a reference document in the Resources section of your BCP and reference it in the process description (or it may already be in your procedures manual).
It's important, however, NOT to narrow your focus to just those processes that you think will be critical, as the BIA will help quantify that for you. You want to the list to be as logically comprehensive as possible without going too granular.
Once the processes (and contacts for each one) are identified (typically a supervisor/manager for the functional area), the team will fill in the high-level workflows for them. (Where does this process come from or originate? What does my team do to complete it? When finished, where does it go or what is the output?)
From there, the team will identify dependencies for each process. These can include other processes that may need to be completed before this process can be started, system dependencies such as the core environment or other applications, websites that are used when performing the process, third-party service providers that support the process, and infrastructure needed. They also can include physical (or electronic) dependencies in the form of templates, documentation, records, and equipment.
The final part of the BIA involves impact scoring for each of the three impact types we plan for (Facility, Systems, and People) and categories within each one (such as Customer or Member, Financial, Legal/Regulatory, Reputational, and Data Loss) and the maximum allowable downtimes for each category (how long this process could be interrupted before threatening the viability of the credit union).
These answers are used to determine your BIA Scores, Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) for each process.
3. Develop the Business Continuity Plan (BCP)
With the BIA in hand, you can now build your Business Continuity Plan. The BCP typically includes:
- High-level checklists for your team to follow in each of the impact types (Facility, Systems and People)
- One or more workgroup recovery worksheets (one for each credit union location)
- Detailed lists of the roles present at each location and how many of each
- Who is needed back when to support the business processes identified in the BIA
- Relocation options for team members in a facility impact event
For each business process, you will also identify the high-level recovery strategy for each impact type (Facility, Systems, and People), considerations, and resource needs. Together, the checklists, worksheets, and process information make up the completed BCP (along with any needed supporting resources).
4. Develop a Systems Recovery Plan (if applicable)
If your technology infrastructure is internally managed, Systems Recovery Planning (historically known as Disaster Recovery Planning) complements the BCP by documenting how your systems will be recovered in support of business process Recovery Time Objectives (RTOs).
To do this, your IT team (supported by your IT vendors) will identify key systems and infrastructure used to support credit union operations and member service. Once identified, the team will determine systems RTOs needed to support the documented business process RTOs. As an example, if the ACH/Draft Exceptions process has an RTO of four hours (typical because this must be completed by a specific time each business day), then it follows that the RTO for the systems and infrastructure needed to perform that process are also no more than four hours. If they are out of alignment, measures will be needed to mitigate the gap as soon as possible.
For each system or infrastructure component identified, the detailed technical recovery process is then documented. This should include:
- Planning assumptions
- Supporting systems or infrastructure required to support the recovery of this component
- Resources needed to both recover the component and validate that it is functioning correctly
- Detailed validation process
- Known deficits of the recovery solution
Once each systems recovery component technical process is documented, a Systems Recovery Playbook should be developed to determine and document the priority and order of recovery for each component.
5. Develop and document the incident management process
Incident Management is what happens between the start of an event and the activation of your Business Continuity Plan (and/or Systems Recovery Plan, if applicable). It documents the notification, escalation, and triage processes up to the point of disaster declaration and includes resources and procedures the incident management team will utilize in responding to and managing an event or incident. Because incidents can occur in any number of forms, the incident management process must be easily adaptable to most situations. As such, this document is typically focused on roles, responsibilities, and resources instead of step-by-step procedures.
Roles and responsibilities typically start with the on-scene personnel who are either present at the time of the event or arrive onsite to investigate what has happened. As such, all staff should have a basic understanding of the initial actions to follow in an emergency and who to contact or escalate to for assistance.
Most incidents fall into one of three categories: Life/Safety, Break/Fix, or Cyber. Each of these incident types carry unique areas of focus and response activities:
- Life/Safety: The safety of staff and visitors is the primary focus of on scene personnel. This would mean either evacuating or sheltering in place depending on the event, alerting public safety officials, and notifying the management team of the situation.
- Break/Fix: Alerting the team or vendor responsible for the impacted equipment, systems, or infrastructure is the priority. The notified team or vendor would then determine the impact and its anticipated duration, escalating to the management team as needed.
- Cyber: The focus is on identifying the nature and scope of the incident, taking steps to contain and control it. This includes notifying the Cyber Incident Response team, and (when warranted) notifying law enforcement, regulatory agencies, and any impacted members.
Other roles include the decision-making authority, usually the credit union CEO or their alternates if the CEO is incapacitated or unavailable. Typically, there is a clearly identified business continuity order of succession to ensure continuity of leadership. The management team supports the CEO and directs the activities of their respective areas during the response and recovery process. As a whole, the incident management team may include the CEO and members of the management team, along with other personnel as needed to provide support for HR, facilities, technology, and communications needs.
Command center information will identify internal, external, and virtual options for gathering the incident management team and coordinating, directing, and supporting the response and recovery effort. For notification and escalation purposes, a contact list for alerting/activating key staff, a notification process for keeping all staff informed of developments, and a key vendor/other contacts list is needed.
Incident monitoring information will allow the incident management team to access news and information resources, as well as detail mechanisms used to support internal communications with staff at other locations or offsite. Communications information will identify the credit union spokesperson (and alternates) and provide communications templates for Facility, Systems, or People impacts; disaster declaration; and more as needed.
Resources can be added to support or enhance the Incident Management process, such as wallet cards or other types of quick reference guides, expanded contact lists, terms, and other reference materials as needed.
Because there are significant differences between a cyber incident and other types of incidents, an expanded Cyber Incident Response process is typically published separately to augment the basic information contained in the Incident Management document. This would include roles and responsibilities specific to the Cyber Incident Response team, contacts (including the cyber insurance carrier/cyber response vendor, state regulatory and law enforcement agencies, NCUA contacts and reporting information, board and supervisory chairs, outside counsel, and league/association), and process guidance for the various phases of the Cyber Incident Response process (including preparation, detection and analysis, containment, eradication and recovery, post-incident activity, the cyber incident reporting process, member notification, and communications templates).
A resources section could include cyber incident quick reference guides or wallet cards, along with specific guidance for common types of cyber incidents and response planning (such as Corporate Account Take Overs (CATO), Distributed Denial of Service (DDoS) attacks, and Malware/Ransomware attacks).
Emergency procedures (typically also published separately) provide support and guidance to staff and management for a variety of situations and typically include additional procedures for robbery, suspicious packages, bomb threats, or kidnapping events.
6. Complete a risk assessment
The Risk Assessment goes beyond the basic impact-based assessment of the BIA (Facility, Systems, and People) and looks at individual, specific threats such as fires, floods, power failures, severe weather events, and more. While it is not needed to build your Business Continuity Plan, it does have significant value in identifying and mitigating specific risks. Most often, this document is used to guide strategic risk-related preventative or mitigation efforts each year.
It begins much like the BIA did, only this time you are identifying threats instead of business processes. This works best when brainstorming with your management team, whiteboarding and then curating a starting list to build from.
As each threat is identified and a general description is supplied, the team would then determine the impact this threat would have on the BIA impact categories (Facility, Systems, and People), assign a probability of the threat manifesting, and the specific impacts of the event were it to occur. These items, plus the velocity (or speed of onset of the threat) are the most common factors used to identify a Risk Score and Initial Risk Rating for each threat.
Mitigating factors (or controls) that you have in place to decrease the likelihood of an event happening or reduce its impacts help to determine the Residual Risk Rating of each threat once those controls are factored in.
Lastly, a Gap Analysis is done to identify additional controls that could be implemented to further reduce the risk. It is from the Gap Analysis that risk-mitigating strategic initiatives are selected (typically a few each strategic planning period).
Using fire as an example risk, with its description being an uncontrolled burn that damages or destroys a facility, it could impact your Facility, Systems, and People. The probability (assuming no controls are in place) may be fairly high on any given day. A fire can break out in seconds in the right conditions, giving it an immediate speed of onset. The impacts could be damage to a section of or loss of the entire facility, damaged or destroyed systems and infrastructure, injury or death to staff or visitors, and smoke and water damage resulting from firefighting efforts.
All of this would likely give fire a high Risk Score and Initial Risk Rating. Mitigation factors could include smoke detectors, sprinkler systems, fire extinguishers or fire suppression systems, evacuation planning efforts and frequent evacuation drills, and investigating and mitigating any potential fire hazards. This could bring the Residual Risk Rating to moderate. Your Gap Analysis might note a large number of personal space heaters in the facility that could be reduced or eliminated. This would be compared against Gap Analyses from the other threats and potentially selected for implementation over the next year.
7. Incorporate plan validation and maintenance in the program
Validation and maintenance activities include:
- Testing or exercising your plans to identify gaps in the planning process
- Education
- Awareness-building activities to enhance staff and management’s understanding of the program and what to do when something happens
- Reviewing, updating, and incorporating new content on a regular basis (at least annually)
Testing or exercising your plans can involve multiple options. The most basic exercise is called an Orientation/Walkthrough, where the documentation is shared with management, staff, or specific teams to familiarize them with the content and how it is used when needed. From there, the next level of testing is a Tabletop Exercise, where one or more event scenarios are introduced, and the participants use the plans and their knowledge to respond. A Functional Exercise can involve testing the recovery process of one or more systems, staff testing of workaround or manual processes, or staff relocation to an alternate site or remote location. Scaled Exercises simply combine one or more functional exercises into a larger event, up to and potentially including a full site or multisite event.
Controls Testing is generally more risk management focused than BCP focused, but they do overlap frequently. There are many control tests you are already doing but may not think of. Fire alarm and security system testing, generator tests, call tree testing, and work from home days are all forms of controls testing. Most credit unions brainstorm with their teams to identify existing controls testing already in place. Other controls can be found in your Risk Assessment and BCP and may already be tested or easily incorporated. Over time, the list can be gradually expanded as the program matures or capabilities change.
Conclusion
It’s important to note that it’s a marathon, not a sprint. You don’t have to do it all at once. Plans can be developed over time and components added in later as the program matures. The Risk Assessment, for example, is not needed to develop your initial BIA, BCP, and Incident Management documentation. It can be developed later during the annual maintenance cycle or as a stand-alone effort. You may start your plan testing with just an Orientation/Walkthrough and Tabletop Exercise and incorporate or expand functional testing in subsequent testing cycles. If you take that first step and keep going, you will get there.