What sort of disaster would it take to bring the operations of the third largest airline in the world to a halt? Must have been some type of “perfect storm,” right? Apparently, a single power outage at its HQ was all that it took, as Delta Airlines discovered this week.
In the early morning of Monday, August 8, a power failure at Delta’s Atlanta headquarters took down its computer systems, forcing Delta to ground every single flight for over six hours. This in turn caused hundreds of flight cancellations and even more delays worldwide, and as of Wednesday morning, flight schedules still had not been fully restored. Nearly 2,000 flights have been canceled so far, and the price of inactivity and refunds could be over $30 million, based on Delta’s 2015 gross revenue of $40.7 billion.
How Did This Happen?
Unfortunately for Delta, there is really no good answer to this question. Law enforcement has commented that the power outage did not appear to be an act of sabotage, and Georgia Power maintains that their power grid was not to blame. It appears that one of Delta’s switchgears was the source of the power failure.
However, a bigger question remains: Why would a single power outage at a single location impact worldwide operations so dramatically? The answer to this question puts even more of an onus on Delta: either Delta didn’t have a backup power source and redundant systems in place, or their business continuity and/or disaster recovery plan failed.
Today, this type of failure is simply unacceptable. The fact that so many systems were dependent on one power source that didn’t have proper failover measures in place is shocking. Stranded passengers even saw the absurdity of the situation, calling it “ridiculous” and “just a mess.” Technology is too important of a component for most companies to ignore foundational planning and testing to prevent outages.
What Does This Mean for Delta?
In addition to the loss of revenue and PR debacle that this incident has caused, it has shown some glaring technological and security flaws. The fact that such a major issue was completely overlooked suggests that Delta’s business and technology strategy are out of alignment and disconnected from today’s technological requirements and demands. Having the right technology in place and a disaster recovery plan to keep your systems up and running is absolutely essential to any company, and one would expect that the world’s third largest airline would be aware of this. (Although Southwest’s similar debacle in July shows that this is not a Delta-specific issue.)
Another outcome of this incident is that it has exposed an Achilles heel for Delta. If you were an attacker, you would now be aware of an easy way to take down Delta’s entire infrastructure. If an unmonitored accident caused this much chaos, can you imagine how much damage a malicious attacker could do with the same power?
A Cautionary Tale
Hopefully this event prompts CIOs and CTOs around the world to review their business continuity and disaster recovery plans and identify any single-points-of-failure within their systems. A single component should never be able to cripple such large-scale systems! While Delta’s CEO has issued an apology, it remains to be seen whether anyone will lose their jobs over this blunder. Delta’s relatively good on-time flight count has been destroyed by this incident, and it will certainly be a while before customer and shareholder confidence returns to normal. The moral of the story is: Don’t let this be your company; review the risks and impacts of technology outages on your business and plan accordingly. There are too many options for business services to continue in the event of a significant incident. If you are Delta, you now have a very solid business case for multiple datacenters around the world providing services so that even if two are completely lost, business operations will continue as normal.