News & Media

OSOM independent report is in

By: Mick Bourke

  11.00 AM 18 January, 2010

Views: 15837

After the One Source One Message (OSOM) system failure on Wednesday 16 December 2009, we commissioned an independent review by Deloitte to find out what went wrong. I talked about this in my last blog post when I said:

"It's important to clarify that the CFA website itself was unaffected; the problem was with the messages being fed to the website. The computer server that enables the messages to go to the website failed . . . We also immediately commissioned a full and independent audit of the OSOM system by auditors Deloitte to make sure all actual and potential pitfalls are identified."

When the report was commissioned, I said I wanted to make the findings public. Now that it's complete, I'd like to give you more details.

To be transparent about what happened and what we've done to fix the problem, I need to use some technical language so those who understand information technology can fully appreciate the issues. I'm going to try to explain it in everyday terms as well.

At the time of the failure, before Deloitte was commissioned, we identified an operational issue with the proxy server that the system uses to generate the web-pages that are displayed on the CFA website. (The proxy server is essentially a computer that sits between the system and the CFA website. All messages go through this server. Proxy servers allow us to speed up network traffic.)

Deloitte identified the root cause of the proxy server failure as a proxy log file reaching file capacity limits, which caused the proxy service to freeze. Web traffic to the proxy server over a prolonged period caused the proxy server to fill to its file limits before the server's programs were due to archive the contents of the log file.

This problem wasn't anticipated during development and testing because test cases used for log archiving and limits did not breach the file limit set on the log file. This means the proxy server wasn't pushed to its maximum limit during the test phase, which was a weakness in our testing procedures.

To improve the system and make sure this doesn't happen again, we're putting a number of measures in place including:

  • updating the proxy server so log files are checked by the platform every 10 minutes and archiving if the log exceeds half its capacity
  • implementing additional monitoring to identify performance issues with the proxy server
  • refining the process for 'fail over' from the CFA to DSE proxy server to reduce time to perform and
  • implementing a second, fully redundant proxy server.

I'd like to acknowledge the advice and involvement of the experts at the Technology Group of the Government Services Group within the Department of Treasury and Finance. I'd like to thank them for their input, as well as Deloitte and the Technology Services team at CFA.

Once again, my apologies for the technical language and I hope this gives you an insight into what went wrong with the system and what we have done to ensure it doesn't happen again.

Last Updated: 10 December 2015