Perform the following steps to recover Cryptographic Security Platform and the deployed solutions after a system crash.

As explained in Installing CSP, only prod-mode installations support disaster recovery.

Recovering single-node installations

When a single-node installation crashes, recover the Cryptographic Security Platform and the deployed solutions from a backup as explained in Restoring the state.

Recovering multi-node installations with a quorum

When your multi-node installation retains the quorum described in Required number of nodes, you must simply remove, restore and add the crashed nodes.

To recover a crashed node.

  1. Mark the node as unschedulable. 

    sudo kubectl cordon <node-to-delete>
  2. Drain the pods from the node to delete. 

    sudo kubectl drain <node-to-delete> --delete-emptydir-data --disable-eviction --force --ignore-daemonsets --timeout=600s
  3. Delete the node from the cluster. 

    sudo kubectl delete node <node-to-delete> --timeout=600s
  4. Restore the node and add it again to the cluster, as explained in Adding nodes.

Recovering multi-node installations without a quorum

When your multi-node installation does not retain the quorum described in Load balancing requirements, follow the steps below for recovery.

To recover a multi-node deployment without a quorum

  1. Run clusterctl uninstall in all the nodes to uninstall Cryptographic Security Platform.
  2. Recover the Cryptographic Security Platform and the deployed solutions from a backup as explained in Restoring the state.