Perform the following steps to recover Cryptographic Security Platform and the deployed solutions after a system crash.
As explained in Installing CSP, only prod-mode
installations support disaster recovery.
Recovering single-node installations
When a single-node installation crashes, recover the Cryptographic Security Platform and the deployed solutions from a backup as explained in Restoring the state.
Recovering multi-node installations with a quorum
When your multi-node installation retains the quorum described in Required number of nodes, you must simply remove, restore and add the crashed nodes.
To recover a crashed node.
Mark the node as unschedulable.
sudo kubectl cordon <node-to-delete>
Drain the pods from the node to delete.
sudo kubectl drain <node-to-delete> --delete-emptydir-data --disable-eviction --force --ignore-daemonsets --timeout=600s
Delete the node from the cluster.
sudo kubectl delete node <node-to-delete> --timeout=600s
- Restore the node and add it again to the cluster, as explained in Adding nodes.
Recovering multi-node installations without a quorum
When your multi-node installation does not retain the quorum described in Load balancing requirements, follow the steps below for recovery.
To recover a multi-node deployment without a quorum
- Run clusterctl uninstall in all the nodes to uninstall Cryptographic Security Platform.
- Recover the Cryptographic Security Platform and the deployed solutions from a backup as explained in Restoring the state.