While T-Mobile continues to make sense of Monday’s network failure, President of Technology Neville Ray said a fiber-optic circuit failed, and its backup circuit also crashed. That caused a chain reaction that strained the network so much that many calls and texts couldn’t be transmitted, reported The Verge.
“We didn’t meet our own bar for excellence,” explained Ray in a blog post. “Many of our customers experienced a voice and text issue [Monday], specifically with VoLTE (Voice over LTE) calling. My team took immediate action — hundreds of our engineers worked tirelessly alongside vendors and partners throughout the day to resolve the issue starting the minute we were aware of it.”
Data connections continued to work for many customers, as did T-Mobile’s non-VoLTE calling. Customers could still use services like FaceTime, iMessage, Google Meet, Zoom, Skype and others to stay in touch. Additionally, many customers were able to use circuit-switched voice connections. Customers on the Sprint network were unaffected, according to Ray. VoLTE and text in all regions fully recovered by 10 p.m. PDT Monday night. The network “is fully operational… and we’re working day in and day out to keep it that way,” he wrote.
T-Mobile’s engineers discovered the trigger event was a “leased fiber circuit failure from a third-party provider in the Southeast,” wrote Ray. This happens on every mobile network, so T-Mobile worked with vendors to build redundancy and resiliency to ensure these types of circuit failures don’t affect customers, he noted. Ray further explained: “This redundancy failed us and resulted in an overload situation that was then compounded by other factors. This overload resulted in an IP traffic storm that spread from the Southeast to create significant capacity issues across the IMS (IP multimedia Subsystem) core network that supports VoLTE calls.”
Ray said T-Mobile worked with its IP Multimedia Subsystem and IP vendors to add permanent additional safeguards to prevent a recurrence. The carrier continues to try to learn what caused the initial overload failure.