We will all have to deal with bad things happening that we couldn't plan for them. When these things happen, our organization or we have to continue to work with a somewhat level of success.
We have chaos engineering for distributed systems and micro-services that allow us to ensure that our system will perform if random components stop working as planned or completely fail. Many organizations spend time setting up disaster recovery mechanisms to help deal with an entire data center going down. Still, most have nothing in place if an employee is incapacitated.
Since starting my career, I have become obsessed with putting processes in place for when I cannot show up. This has allowed me to accept promotions or new opportunities without worrying about my team's fate.
They are some learning from disaster recovery plans that I use when planning for the bus factor, namely:
- Recovery Time Objective: How long will my team take to start operating well without me?
- Recovery Point Objective: How often do I keep the team informed of what is happening to continue where I stopped quickly?
- Inventory of information and initiatives: I need an easy-to-access source(s) of knowledge of everything happening.
- Identify Personnel Roles: The people that will be capital to the success of a sudden succession plan.
- Run continuous practice Tests to ensure your plan is effective: How often do you let other people take control to get used to having the authority?
There are other things I left aside until I find a way to practice them in my everyday life.