November 30, 2003

Issue Handling after an upgrade

After upgrading any system, issues always arise. No matter how much testing is done, something is always not examined during testing. It's just one of the facts of life. So anticipating how to find these issues, before hand should be apart of the live plan.

The first step is determining how to capture the issue. For our recent CFMX upgrade, we had several methods. Users would call the help desk, and that issue would be routed to our group. In addition, we placed a site wide error handler within CFMX. This error handler emailed us the IP address, the browser info, the time of the error, the referring page, the page causing the error, the query string on the page, and the error message. We also had people clicking through the site to find issues.

The second step is how to route and resolve the issue. All issues should go to a triage person. This role needs to be filled for the whole time. (A lesson learned) The triage person is quickly able to diagnosis the severity of the issue. Is this a quick fix, here you go, or do we need to get a better handle and analyze the full impact? The triage person either fixes the issue or routes the issue to 2nd level support. 2nd level support isn't tied up in the myriad of issues, they have the time to properly analyze, code, test, and release the fix. The issue prioritization is also essential, if it is a user identified problem they are first on the list, second is grouping the error handler issues, those that occur most frequently to the sparsely occurring.

The next step is to see where else this issue will impact that hasn't been captured, and preemptively resolve those issues.

After executing the plan, review it, and see where it can be improved and where it was a hindrance. Admitting mistakes, allows others to learn from them, and the company doesn't have to incur the cost of the mistake again.

Finally, the last step is to explain to the questioners, why this happened and how it can be preemptively captured in either testing or releasing the next time. Whenever there is an issue, there is always a questioner. It is inevitable, the two are dependent factors, and one never occurs with out the other.

Remind the questioner, that due to the action plan for issue resolution, the customer issue was resolved in anywhere from 5 minutes to xx time, and had an average of yy. The customer was also aware that someone was working to resolve their problem, immediately.

Seriously, providing a stable environment to the users is the most important thing, and having a plan in place to cover the hickups that are incurred is a worthy exercise.

Posted by Elyse at November 30, 2003 6:36 AM | TrackBack
Comments

Wow, I just wanted to tell you that I have never had a site just hit me in the face like yours just did. It really made an impact, and is really clean and fresh in its simplicity.

Thank you for the inspiration!

Posted by: Krikette at December 15, 2004 4:40 PM