...
If you become aware of an imminent or active disaster that affects CloudCard, immediately contact the Managing Director or the Principal Engineer Customer Support team as soon as possible, via phone call if possible.
If a disaster has occurred, please contact your supervisor or the Customer Support team as soon as possible to let them know if you are safe.
Key contacts | CloudCard Support
|
Luke Rettstatt / Managing Director
Phone: (434) 253-5657
Anthony Erskine / Principal Engineer
Phone: (434) 248-0444
Key Locations | Primary Office: 1103 Wise Street, Lynchburg, VA 24504 Online Meeting Room: |
Contents
Table of Contents | ||||
---|---|---|---|---|
|
...
Strategy for maintaining continuity of services:
KEY BUSINESS PROCESS | CONTINUITY STRATEGY |
Customer Service Delivery | Rely on AWS availability commitments and SLAs; use multi-site active active, with cross-region backups where possible. |
IT Operations | Use SaaS applications or AWS hosted applications to ensure operations do not depend on a single physical location and are conducive to remote work arrangements. |
Utilize Gmail and its distributed nature, rely on Google’s standard service level agreements. | |
Customer Support | All systems are vendor-hosted SaaS applications, use Gmail as communications channel if helpdesk is down. |
Finance, Legal and HR | All systems are vendor-hosted SaaS applications. |
Sales and Marketing | All systems are vendor-hosted SaaS applications. |
Anchor | ||||
---|---|---|---|---|
|
Person | Roles | Responsibilities |
---|---|---|
Managing Director | Coordination and Communication | Determine activation of plan Coordinate employee response Coordinate communication of status internally and externally Prioritize activities to ensure safety, security, and core services are maintained or restored as soon as possible Work with Customer Support team to ensure employee safety Review and Test plan annually Ensure pizza is provided |
Principal Engineer | Technical Execution; Alternate for Managing Director | Provide technical guidance on mitigating actions to the Managing Director Ensure all failovers complete smoothly Deploy new infrastructure to replace failed infrastructure where necessary. Review and Test plan annually Designate and brief alternate person in case of unavailability. |
Customer Support Team | Communication | Communicate with employees to ensure safety Monitor employee safety status Communicate status to customers and resellers Handle questions from customers and resellers |
Engineering Team | Technical Execution | Support Principal Engineer as needed to recover services |
Revision History
Note - prior to April 2023, CloudCard had a Business Continuity Plan, which was separate from the Disaster Recovery Plan. At the end of March 2023, we merged the two plans as part of our SOC 2 Compliance preparations.
Version
Date
Description
Version | Date | Description | Author | Approved by |
1.1 (Disaster Recovery Plan) | October 2018 | Initial Plan | ||
1.1 (Disaster Recovery Plan) | November 2019 | Clarity and Accuracy Updates | ||
1.2 (Disaster Recovery Plan) | March 2021 | Updates to Contact Details | ||
1.2 (Disaster Recovery Plan) | February 2023 | Accuracy Update | ||
1.3 (Disaster Recovery Plan) | March 2023 | Updated to reflect Active-Active AWS strategy | ||
1.4 (Disaster Recovery Plan) | March 2023 | Improved based on results of testing of plan | ||
2.0 (Business Continuity and Disaster Recovery Plan)* | 2023-03-29 | Merged Disaster Recovery with Business Continuity | Ryan Heathcote | Luke Rettstatt |
...
2.1 | 2024-07-20 | Updates from review of annual test | Ryan Heathcote | Luke Rettstatt |
* Note - prior to April 2023, CloudCard had a Business Continuity Plan, which was separate from the Disaster Recovery Plan. At the end of March 2023, we merged the two plans as part of our SOC 2 Compliance preparations.
Anchor | ||||
---|---|---|---|---|
|
Appendix: Disaster Recovery Strategies
...
If the disaster affects the Virginia region:
consult news sources to gather information on impact of disaster
If employee safety could be affected, immediately direct the Customer Support Team to follow the Appendix: Employee Safety Confirmation Process.
Check the CloudCard Internal Status Dashboard Attempt to log into CloudCard
Attempt to log into the AWS console
Check the CloudCard Internal Status Dashboard (an internal site which will be known to relevant employees which includes information on CloudCard system health and relevant AWS Status feeds)
Determine if CloudCard systems are experiencing downtime
Determine if AWS has published any notices.
Attempt to log into CloudCard
Attempt to log into the AWS console
Observe the state of the database and application environments:
Are the major components (autoscaling functionality, RDS cluster) still operational?
Is autoscaling and failover functioning normally and recovering the services?
Is the service recovery trending towards normal within less than 15 minutes?
...
Review news from Google to determine timeline for restoration of services.
Update the CloudCard service updates page to indicate outage.
Customer Support team communicate with customers via HelpScoutActive Campaign.
As a last resort, if service is unlikely to be restored for a significant period, and other email providers are fine, set up temporary or permanent operations on a different email provider (Microsoft Office 365) and repoint DNS records.
HelpScout unavailable:
Review news from HelpScout to determine timeline for restoration of services.
Update the CloudCard service updates page to indicate outage.
Customer Support team communicate with customers via Google Workspaces Email.
As a last resort, if service is unlikely to be restored for a significant period, and other help desk providers are fine, set up temporary or permanent operations on a different helpdesk provider.
SquareSpace unavailable:
Review news from SquareSpace to determine timeline for restoration of services.Set up a simple html static site in S3 and repoint dns for the website to the static site
Update the CloudCard service updates page to indicate outage.
Customer Support team monitor HelpScout for customer questionscommunicate with customers via Google Workspaces Email.
Log into the Google Support account and review emails with the Support label.
As a last resort, if service is unlikely to be restored for a significant period, and other website hosting provider help desk providers are fine, set up temporary or permanent operations on a different website hosting provider, or rebuild the website on the S3 static sitehelpdesk provider.
Sales / Accounting / Task Tracking / SquareSpace unavailable:
These services do not affect CloudCard’s immediate ability to serve customers. If they become unavailable, staff should use spreadsheets or manual systems to track information until the system comes back online. If the outage is likely to be prolonged, CloudCard should seek another service provider.
...
The CloudCard office is unavailable
Most employees home offices are affected by the disaster
The entire region within at least 1 hour driving time of Downtown Lynchburg is affected by the disaster.
Some locations within 12 8 hours driving time of Downtown Lynchburg are unaffected
At least some employees can safely commute to an unaffected location.
...
The CloudCard office is unavailable
Most employees home offices are affected by the disaster
The entire region within at least 12 8 hours driving time of Downtown Lynchburg is affected by the disaster.
It is therefore impossible to find a location within 12 8 hours driving time of Downtown Lynchburg that is safe to work from, connected to the internet, and that at least some employees can commute to
...
Anchor | ||||
---|---|---|---|---|
|
Asset | Scenario | Recovery Strategy | Recovery Time Objective (RTO) | Recovery Point Objective (RPO) |
AWS Data and Services | Amazon data center failure or destruction | Autoscaling, failover, or restoration of backups | < 1 hour | < 1 hour |
Main Office | Major utility Outage | Alternate work location | < 1 hour | < 1 hour |
Employee Home Offices | Major utility outage | Alternate work location | < 12 hours | < 12 hours |
Google Workspaces | Major service outage | Rely on Google SLAs | ||
HelpScout | Major service outage | Use Gmail until service restored |
Anchor | ||||
---|---|---|---|---|
|
...