UK. July 17, 2012 — A serious failure in the O2 UK mobile network last week is being ascribed to a failure of a Home Location Register (HLR) software system. As reported on our sister site, VanillaPlus.com, the wholly-owned Telefonica subsidiary, which trades in the UK under the O2 brand, lost service to a large number of subscribers for more than 12 hours on January 11 and 12.
The outage did not affect all subscribers, but many were unable to connect to the network for almost a day. The system failure, say analysts, meant that the network was unable to recognise certain devices, so that mobile users standing side-by-side in the same cell could be affected differently with one able to connect the network and receive normal service while the other subscriber lost service completely.
O2 apologised on Thursday for the failure, saying that several back-up systems also failed, compounding the problem and leaving as many as 7 million people without connection. CEO, Ronan Dunne, appeared on BBC TV News to apologise to customers for the loss of service. He said “multiple layers of redundancy” had failed and he apologised to users for the “exceptional situation”.
The failure began at 1.30pm on July 11. Voice communication was the first service to be fully restored early on Thursday July 12, with 3G data connections resuming during Thursday morning.
Unsurprisingly, Dunne is promising a “root and branch review” of the network in the coming days. But the reputational damage to Telefonica’s subsidiary — one of the UK’s largest mobile phone networks — is hard to over-estimate. Industry observers say that the additional pressure on all networks during the Olympics which are being hosted in London and around the UK starting on July 27 mean that failures like this could well be repeated by O2 or other network operators.
O2 is not prepared to talk about compensation, nor is it contractually obliged to offer it. But steps will have to be taken to ensure that large enterprise customers, including M2M service users, are not spooked by this failure of systems and back-ups into churning to other providers. Meanwhile, the alternative providers, notably Vodafone, 3, and Everything Everywhere (Orange & T-Mobile), are likely to be reviewing their operations support systems (OSS) and the resilience of their own network back-ups.
If your business has been affected by either the O2 UK or Orange France service failures, let us know about your experiences by emailing:
Yankee Group principal analyst, Ken Rehbehn commented, “Two prolonged major mobile network outages in Europe, at O2 in the UK and Orange in France (in June), shine a strong light on the fragile nature of today’s complex mobile systems. While not the first — and surely not the last — major failure for a mobile operator, the events suggest a new reality: the days of ‘five-nines’ reliability for major communications networks are long gone. Largely, it is a reflection of the challenge faced by system designers tasked with supporting massive scale for customers that move about. While the impact of failure for most components of a mobile network is limited to small regions, such as an area around a single tower site, registration database failures tend to have system-wide catastrophic impact.”
“To a certain extent, we will never be free of this risk as these systems are highly scaled and widely distributed. But the failures — as well as failures in the early days of Verizon Wireless LTE core network — help build an experience base that helps the industry harden networks and avoid future problems. For suppliers, the failures actually help solidify market position by making it clear how risky it is to make changes to an operational environment. The barrier to entry in this market has just become higher,” added Rehbehn. “Incumbents such as Ericsson, Alcatel-Lucent and Nokia Siemens Networks that have battled with these complex deployments can point to real-world success and learn from real-world failures; the seasoning from both helps build operator confidence for future deployments.
“For operators, these events point to a need for greater operational diligence and improved crisis response that helps shore up reputation. The events show that customer experience management must go beyond the mundane issue of billing questions to include catastrophic failure contingency plans. It is sometimes said that what does not kill you, makes you stronger. In the case of the recent mobile network failures, we may have an opportunity to see if the old adage is true,” he concluded.