Alrighty then. Skype is back today, purring along nicely with eight million plus users logged into the network. I'm glad; I think Skype's cool. But the post mortem contains some interesting questions. Mainly, could such a global outage--probably the biggest yet in VoIP's short history--happen again?
The whole business was touched off August 16 when a big batch of Microsoft updates caused many Skype users' computers to shut down and reboot at around the same time. "The disruption was triggered by a massive restart of our users? computers across the globe within a very short timeframe" Skype says. But a careful read of Skype's explaination today shows that that led to two seperate meltdown-inducing problems:
First, Skype says, when all those user PCs logged off, it "created a lack of peer-to-peer resources." Wassat? Peer-to-peer (P2P) networks like Skype's rely on user PCs at the edges of the network to act like little volunteer phone switches that help route calls to and from other users in that area of the network. So when millions of those "nodes" went offline, Skype's "network resources"--its means of delivering calls--were severely depleted.
Secondly: Skype says that when those millions of users rebooted and attempted to log back on to Skype, the system was overloaded and became "unstable." This, Skype had said throughout the outage, was the primary problem. I don't think so. I think the real damage had already been done by then: I think that when all those crucial peers started disappearing from Skype's network, it could no longer route calls normally and the whole system began collapsing on itself.
To my ear, that second reason, the "log on problem," sounds like a red herring designed to take attention away from the first reason. It's a lot easier to fix the system's ability to log in millions of users than it is to fix a fundamental capacity problem with the P2P network itself.
Has Skype gotten so big that when part of its circulatory system comes under stress its entire body starts shutting down? Did Skype have a stroke? If my hunch is correct--that the stability of Skype's network under stressful conditions was the real problem--then you bet, it could all happen again.
I don't claim to be an expert here. I'm no network engineer. I invite anyone who's well-versed in the wonders of P2P to write in and tell me I'm wrong.
UPDATE: 8/20/07 -- 5 p.m. I don't appear to be alone. Check out this wire story we just posted: "Skype Users Don't Buy Outage Explanation."
Well, with more and more users moving to Linux, maybe this won't be a problem in the long run...
TheWitness