A investigation into why a major government IT shutdown happened, and what can be done to prevent a repeat incident, could provide answers by the ‘end of next week’.
The States of Guernsey has been experiencing widespread IT disruption since 25 November.
The trouble began when the air conditioning in the main server room failed. Temperatures climbed to 48 Celsius, sending the equipment into meltdown.
‘Public-facing' services such as benefits payments, school emails and WIFI and online flight and harbour information were all off-line.
The major disruption lasted for eight days while engineers worked to restore services.
An investigation has begun to pinpoint the main cause and to expose any underlying issues within the States’ server room.
Guernsey’s Head Of Public Services, Mark De Garis, says the results of that investigation will be made public as soon as possible - estimating the main conclusions to be completed by the end of next week, with some ‘additional log file analysis’ possibly prolonging the release.
He explains what they currently understand to be the series of failures which lead to the prolonged system outage:
"There was a failing of the air conditioning unit, it backed up with a secondary air conditioning unit, but that didn’t work either, which then caused the temperatures to rise.
Then, our systems - which are mirrored to a second site as well as being back up to a third independent site - didn’t failover quickly enough for services just to run seamlessly as they should.
We have over 200 different applications. The server room has got approximately 500 servers in it. This is a really complex picture, with a whole sequence of actions that need to happen."
He’s revealed to Island FM why some users may still experience some issues online:
"All of our services are functioning, but we continue to suffer ‘performance issues’ is the best way of describing them, because as we bring services back up and check them, then they inadvertently cause disruption elsewhere in the system
Mr De Garis says while the stored information was protected, extensive and expensive physical damage has been done:
"The data across the States of Guernsey sites was safe. There was no threat to that. It was actually backed up in three separate locations, so the data was secure at all times.
Some of the equipment has suffered heat damage, and those elements are being replaced by engineers this week. It’s far too early to give a full figure of the cost for the States of Guernsey, but it will have incurred significant cost."
He says they are already working to prevent similar disruptions:
"We’ve been aware that the resilience isn’t what it should be.
One of the reasons for this is that we are dealing with a number of old systems, called legacy systems.
Each committee had their own IT system supporting its users, but these are on very old platforms - some of which you can’t buy these days - so if they do break we are reliant on patching them to keep them going until we can move them across to a new modern platform.
We have started doing that. 89 of our systems have moved across into a new IT infrastructure. This has been done over the past three years and the plan is to complete that for all of our systems by the end of next year."