Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Effective Date: 2023-05-01

TODO:

appendix - scenarios

Quick Reference

If you believe a disaster that will affect our business has occurred or is imminent, contact the Managing Director or the Principal Engineer as soon as possible, via phone call if possible.

...

If a situation is so severe that there is no way for even a minimal number of CloudCard staff to safely relocate to an internet connected work location, we assume that it is immaterial for CloudCard to continue operations. For example, in the case of a natural disaster destroying power and network infrastructure across the entire Eastern and Central United States, CloudCard’s ability to continue operations, and the meaningfulness of those operations. In such situations, very few people will be connected to the internet at all, and so CloudCard’s employees should focus on finding safety and taking care of others. Once power or network infrastructure is restored to a sufficient extent, CloudCard should be able to resume continuity efforts according to one of the Scenarios below.

Anchor
aws-disasters
aws-disasters
Disasters affecting AWS Availability Zone(s)

Plan of Action

  1. Assemble team in the appropriate meeting room (Managing Director)

  2. Pray

  3. Monitor service Failover and deploy backup infrastructure (Principal Engineer)

    1. Ensure database failover occurs; Add additional read replica if needed.

    2. Ensure auto scaling replaces lost services with new nodes.

      1. If the application scaling infrastructure is disabled, create a new application environment from backed up code artifacts.

    3. Determine if any data loss occurred, or if data needs to be corrected (e.g. to prevent stuck jobs). If so, restore or recover the data from backups.

    4. Determine if any secondary services are down, and recover them.

  4. Determine service and data recovery timeframes (Principal Engineer)

  5. If service is likely to be degraded for more than 15 minutes, Direct Communications Team to contact Customers and Resellers to make them aware of the situation (Managing Director)

    1. Update the service updates page and direct customers to review it for updates: https://onlinephotosubmission.com/service-updates

  6. Improve upon the process in case of a future disaster (Managing Director and Principal Engineer)

Outages of Business Critical Services

Google Workspaces Email unavailable:

...

Update the CloudCard service updates page to indicate outage.

...

Customer Support team communicate with customers via HelpScout.

...

Review news from Google to determine timeline for restoration of services.

...

Data / Infrastructure Sabotage or Human Error

In this scenario the assumption is that an attacker or an employee has intentionally or accidentally tampered with production resources to such an extent as to cause a major outage.

  1. Triage - determine the actor causing the sabotage

    1. If more appropriate, follow the Incident Response Plan.

  2. Perform containment to ensure further actor access or action is prevented.

  3. Follow the steps in Disasters affecting AWS Availability Zone(s) to restore services.

  4. Take appropriate legal, disciplinary or training action.

Outages of Business Critical Services

Google Workspaces Email unavailable:

  1. Update the CloudCard service updates page to indicate outage.

  2. Customer Support team communicate with customers via HelpScout.

  3. Review news from Google to determine timeline for restoration of services.

  4. As a last resort, if service is unlikely to be restored for a significant period, and other email providers are fine, set up temporary or permanent operations on a different email provider and repoint DNS records.

...

If it is impossible to find a location within 12 hours driving time of Downtown Lynchburg that is safe to work from, connected to the internet, and that at least some employees can commute to, we assume that it is immaterial for CloudCard to focus on continuity efforts at this time. The Managing Director should monitor the situation until a safe location becomes available, and maintain regular communications with employees to whatever degree possible to ensure their safety and arrange help where possible.

DDOS

Data / Infrastructure Sabotage

Death or incapacitation of key leader

Director should monitor the situation until a safe location becomes available, and maintain regular communications with employees to whatever degree possible to ensure their safety and arrange help where possible.

Distributed Denial of Service (DDoS) Attacks

AWS provides base level protection against DDoS and similar attacks. If a situation becomes more severe than the built-in AWS protection provides, Contact AWS support for assistance in dealing with the situation.

Death or incapacitation of key leader

CloudCard’s Managing Director and Principal Engineer, along with any other executive level roles should each designate another employee as their alternate. The alternate should be briefed on the responsibilities of the given role and able to perform interim responsibilities in case of death or incapacitation or other prolonged inability to work or advise of the person in the given role. The level of alternate briefing should be evaluated during executive vacation, and a debrief after each executive vacation should identify the areas of the alternate’s briefing that need to be improved.

Small Scale Events that are out of scope

...

  • Loss of connectivity for a single employee

  • Laptop failure for a single employee.

  • Loss of availability of a production application or service necessary to CloudCard’s operations that either (a) does either not affect all of CloudCard’s core services, or (b) is short-lived (outage lasting less than 4 hours) (see Incident Response Plan)

Anchor
appendix_rpo_rto
appendix_rpo_rto
Appendix: Asset RPO and RTO

...