Companies invest a lot of time and thought on ways they can reduce risks on their systems. Minimizing risk in our systems helps our businesses avoid the disruption of the thousands of applications that rely on them. However, no one system is perfect; accidents happen and failure occurs, it’s just a matter of time.
It’s what happens after the occurrence of an incidence that matters. There’s a lot that we stand to learn from failure. That is why we conduct post-mortems — a retrospect on what went wrong and how to keep it from happening in the future. As an operation team prepares for all the eventualities that may cause the failure of a system, it also needs to come up with a port-mortem strategy for the plan to be effective. Keep reading to learn some quick tips: how to post mortem every incident:
What Makes a Good Post-Mortem?
Every company that knows and appreciates the importance of retrospect after an incident is always trying to perfect its post-mortem technique. A post-mortem is a critical part of the process of trying to prevent failures and outages of our systems. Keep the following in mind when you conduct a post-mortem on an incident:
- Ensure that changes have been made to prevent the occurrence of the incident in the future.
- Highlight the effect of the incident to your customers for shareholders and support.
- Make sure that the technical team members are educated on the issues that caused the incident.
The person who was on call during the incident (that is, the primary person who dealt with it) will write up a post-mortem immediately after the incident. The team should then go through the post-mortem in the next meeting. Every incident, small (could have been dangerous) or large (resulted in failure or outage) should be discussed. Here are some of the questions that should be asked during the meeting:
- What time did the incident occur?
- Did it have any impact on users/customers?
- What was the timeline for the events?
- What can be concluded from the post-mortem?
- What actions should be taken?
The post-mortem needs to show what went wrong, and in what order it was responded to. Just saying that a particular user reacted in an unpredictable way or a process died isn’t enough. Going the entire process accordingly helps you learn where to look for symptoms of failure in the future.
Post-Mortem Must Have an Agenda
When conducting a post-mortem, the last thing you want is having a totally disorganized mess that after an hour, leaves you without learning anything. You need an agenda; even the most relaxed meetings need an agenda. Make sure that the agenda addresses all the issues raised by the post-mortem.
A success outage resolution must go hand-in-hand with a comprehensive post-mortem. Give your team an opportunity to learn by making sure that everything is documented correctly. It gives the company a chance to grow and ensures that there is no possibility of repeating the same mistakes.