MENU

ISPs Learn from Katrina, Survive Gustav

Sprint and Cogent Repeer—For Now

October 31, 2008 Comments (21) Views: 454 Engineering

Wrestling With the Zombie: Sprint Depeers Cogent, Internet Partitioned

Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInShare on Reddit

A special Halloween edition of the Renesys Blog: That which was whole is now torn asunder, and cries of grief ring out throughout the land. Cogent (AS174) and Sprint (AS1239) are no longer connected to each other. Customers of each network who do not have other providers—namely single-homed customers—cannot reach each other. Two large portions of the Internet are separated.

Cogent is frequently involved in peering disputes. In the last three years, the only significant peering dispute (one that caused a temporary partition of the Internet) that did not involve Cogent was between Level 3 and XO. That one was settled very quickly. All of the others (Cogent depeering Telia, Level 3 depeers Cogent, and further disputes going back years involving Teleglobe (now Tata, AS6453), France Telecom (AS5511)) involved Cogent.

But in this case, Cogent may have picked the wrong sparring partner. In the past, Cogent won peering disputes simply because their customer base was less sensitive to the outage than the other party in the dispute. Ultimately, the one whose customers complain the loudest loses. This time it may be very different. Sprint hasn’t paid any particular attention to its IP product and network at a senior management level for a very long time. They are clearly focused on wireline and wireless telecom services and Overland Park management seem to remain mostly unaware that they even operate an IP network. In other words, Cogent has picked a fight with a zombie here. They may even rip off a limb or two, but that doesn’t mean the zombie will notice.

Sprint and Cogent only starting peering recently, back in November of 2006. Prior to that the two networks reached each other via NTT Communications (AS2914). Now, almost exactly two years later, it appears that Sprint has disconnected Cogent and chosen to divide the Internet. Cogent has stated that they will litigate this issue so this one is unlikely to get resolved quickly. In the mean time, over 200 downstream autonomous system customers of each organization cannot reach the networks in the other. This is ugly and will remain so.

Let’s take a quick look at what we know so far and set the stage for a story that will likely continue for several days, if not weeks. I’ll also try to set this in a larger context about the evolution of each of these networks and the evolution of Internet interconnection on the whole.

Timeline: Cogent lost access to Sprint’s prefixes between 20:00:11 UTC (4pm EDT yesterday, 30 October 2008) and 20:00:22. Sprint lost access to Cogent’s prefixes between 20:00:22 and 20:00:27. The timing on an even hour suggests that the event was human-initiated. After the adjacency was lost, the only workable paths between 174 and 1239 were leaks: unintentional readvertisements of paths. For example:

  • 1239 6327 19752 19752 27168 27168 577 174

That is not a good path. There are just so many things wrong with it. The large length. The number of hops between AS1239 and AS174. The fact that AS577 (Bell Canada) appears in the role of transit provider to Cogent. The fact that Shaw Communications (AS6327) shows up as providing transit for Sprint. Aside from clear, Canadian mistakes like this, there’s really no reachability between the two organizations.

One common reaction to this schism is to say: “So what?” This only affects people who are either single-homed to Cogent or single-homed to Sprint and given all of the high-profile depeering that has gone on over the past three to five years, how many people can that seriously be?

Throwing caution to the wind, 289 autonomous systems are completely single-homed behind Cogent (that is, they have no connectivity to the the Internet through anyone else). 214 autonomous systems are completely single-homed behind Sprint. This number actually significantly understates the impact of this outage on the Internet, though. Due to Cogent’s aggressive pricing, there are a large number of service providers who are multi-homed but who default all of their outbound traffic through Cogent. This is true for Renesys’s deployment in Boston, and it’s also true for a number of other ISPs. In these cases, although those ASes and prefixes show up as unaffected, traffic originating from those users bound for Sprint-connected users will simply not work.

Another way to look at the scope of this event is to identify the number, size and ownership of the network prefixes affected by the outage. The most common way of measuring the size of a network is to look at all of the prefixes in their “downstream cone”—that is, the set of networks that are transitively downstream of a given ASN. Sprint has approximately 100000 prefixes in their downstream cone, of which at least 1989 are single-homed (are not advertised in such as a way that they are reachable via any other provider). Cogent has over 30000 prefixes in their transit cone, of which at least 1544 are single-homed. So, in total, at least 3500 networks on the Internet have less than full connectivity right now. But due to reasons that I cited above, the impact is probably significantly worse than that. We’re also looking at the analysis of “single-homed” in this case to see if we can identify prefixes that are missing transit even when it appears that they are not. Expect a follow-up post on this issue.

One might suspect that these single-homed autonomous systems are simply incautious or insignificant networks. After all, given the history of Internet partitions, who would be rash enough to have important network services located on a single-homed prefix in this day and age?

The following prefixes are some of the more interesting networks single-homed behind Sprint:

  • 208.95.96.0/21 Expedia, Inc.
  • 164.62.0.0/16 Federal Trade Commission
  • 204.108.8.0/24 Federal Aviation Administration
  • 198.9.201.0/24 National Aeronautics and Space Administration
  • 170.189.200.0/24 Occidental Petroleum Corporation
  • 148.168.0.0/16 Pfizer Inc.
  • 128.6.0.0/16 Rutgers University
  • 173.100.0.0/16 Sprint PCS (lots of networks here, of course)
  • 149.24.174.0/23 SUNGARD HIGHER EDUCATION INC.

And that is just a few.

The following prefixes are some of the more interesting networks single-homed behind Cogent:

  • 89.251.2.0/24 Joost Production Benelux Network
  • 72.5.224.0/24 Loopt, Inc.
  • 198.185.178.0/23 National Aeronautics and Space Administration
  • 204.201.48.0/21 NTT America, Inc. (and many more like it, from the T1 and hosting customers acquired from NTT/Verio)
  • 204.9.56.0/24 Skynet Access (this might actually be good news, if the loss of connectivity to Skynet prevents or delays sentience).
  • 142.155.0.0/16 St. Lawrence College
  • 128.100.0.0/16 University of Toronto (and a bunch of other colleges and universities)

Notice NASA single-homed on both sides of this division? I have no idea what that is about. The point here is that this is a big deal. There are lots of significant organizations that appear to have lost connectivity due to this dispute.

This dispute is unlikely to be resolved quickly. We’ll revisit it over the course of the weekend and into next week to see how it develops. In particular, it will be interesting to watch the public positioning from both parties, including whether Sprint issues any kind of a statement or indicates any attention to the matter at all. If Sprint really doesn’t care, then Cogent will lose.

21 Responses to Wrestling With the Zombie: Sprint Depeers Cogent, Internet Partitioned

  1. Sprint vs. Cogent

    As you may or may not be aware, Sprint and Cogent have engaged in a sparring match over peering. The two networks are currently isolated from each other. I recommend reading this interesting summary of the situation: Wrestling With the Zombie: Sprint D…

  2. Sprint blocking Cogent network traffic…

    It appears that Sprint has stopped routing traffic (called “depeering”) from Cogent as a result of some sort of legal dispute. Sprint customers cannot reach Cogent customers, and vice versa. The effect is similar to what would happen if Sprint were to …

  3. Great analysis, as always!
    I included an excerpt of this and a link on the O’Reilly Radar.
    -Jesse Robbins

  4. One thing I would really love to see is a BGPlay-style animation showing the extent of the problem.

  5. TWC says:

    Everything I have read says the connectivity issue is between Sprint customers and Cogent single homed hosts but that is not really the case. My ISP Wide Open West in Chicago is routing packets to Cogent connected hosts via Level3 and there is supposed to be peering between Level3 and Cogent but it looks like they are being routed off to routers in the 207.180 network and then disappear all together (timeouts).
    The internet monitor site reports good connectivity between level 3 and Cogent but my guess is that they are not routing across that connection for some reason.

  6. James says:

    This must piss off Google as much as it pisses off all the ISP customers which just got partitioned. Sprint has shown they aren’t mature enough to be trusted with tier-1.
    I hope this is what it takes to get the Goog to light up all that dark fiber they’ve been collecting and cross-peer with the tier-1s and soon-to-be-former Sprint customers who care more about their connectivity than petty internecine squabbles about uneven traffic.

  7. Thomas Sears says:

    Thanks for your explanation and research. I am a small virtual host that cannot connect with my website(s) and am not getting email. I have a suggestion for other opensource hosting clients: contact a class action attorney if you are unable to connect…if this continues as long is it appears it might, we are SOL and are going to lose a lot of business.

  8. You are also now linked to from two threads at the Sprint-Nextel discussion forums, http://forums.buzzaboutwireless.com/baw/board/message?board.id=news&message.id=3193
    Excellent initial analysis!

  9. JP says:

    “Aside from clear, Canadian mistakes like this, there’s really no reachability between the two organizations.”
    It’s a mistake that canadian ISPs provide transit to the customers that pay for it? Are you serious? What kind of xenophobic psuedotechnical trash is that? You just ruined the entire article.

  10. Clueless James says:

    “Sprint has shown they aren’t mature enough to be trusted with tier-1.” This most definitely an ignorant statement. since you use google so much why don’t you google “Sprint Cogent peering” plenty of articles state this isn’t the first time Cogent has had issues with peering with other ISP’s. It seems cogent is the one who isn’t mature enough to be trusted with tier-1 services trying to take advantage of other ISP’s. As underwood himself said, “Cogent is frequently involved in peering disputes.” This is obviously a blog simply based on opinion as Underwood didn’t even talk to either side in regards to why ties were severed in the first place. If you have issues connecting between cogent and sprint open a issue, case or ticket with your corresponding ISP as one of the other articles i have read on this stated. this sucks for all invovled!

  11. pamela says:

    I am pissed. As a sprint customer, i can no longer access my webhost/website, i have had for years!!

  12. You’re just being silly. Sprint doesn’t pay shaw for transit. Cogent doesn’t pay Bell Canada for transit. It’s just a routing leak. These things happen. The “Canadian” part of the comment was a joke about the fact that both sides of the routing leak happen to be in Canada. The comment was neither pseudotechical nor xenophobic.

  13. David says:

    “But in this case, Cogent may have picked the wrong sparring partner”
    I read a few articles about this peering dispute that have insinutated that Cogent “picked a peering fight” or “sparring partner”. I think this is unfair since Sprint is the party doing the de-peering. If anything Cogent is standing up to the bully. Sprint is a clueless since they are trying to bully a company with a network that is three times bigger.

  14. Kevin says:

    @ David
    Not hardly. Sprint has a larger IP network and has been an actual Tier 1 provider since NSF days. Cogent has been very aggressive in the last few years as they picked up fiber and net assets for pennies on the dollar. Their traffic HAS been asymmetric in their favor and that is why they have had such a series of fights with existing peers such as Level 3, TeliaSonera, FT and now Sprint.
    Peering is actually a rather fragile economic model for such an important utility to modern life to be dependent upon (the core Internet). Its underlying assumption is that the traffic flows from each network are approximately in balance over time and that each party has about the same bargaining power.
    As I understand it, if things break down between peers, there is only one option – depeering. There does not appear to be any more gradual way to deal with the issue.

  15. dts says:

    Well, I can confirm that Sprint EVDO network is affected. I can also guess that email-to-SMS gateways are as well, for those who use their Sprint phones as pagers and now won’t get alerts sent from Cogent-only sites.
    This raises the very real question as to whether Sprint’s EVDO service marketing/sales materials lie. They say something to the effect of “unlimited access to the Internet from anywhere” and yet they are delivering access to parts of the Internet and by this de-peering action blocking substantial numbers of sites. Sony, and others, who sell notebooks with the EVDO stuff built in might have to reconsider their sales claims as well.
    I’m working from a mobile setup as I’m travelling. Indeed, I find that my Sprint EVDO card fails to reach many sites. I was already starting to think about changing vendors due to billing issues with Sprint. Now they’re giving me more reasons to move to a different carrier for wireless data and for mobile telephone service. My small business doesn’t have enough service with them for them to care, though I do wonder if others might have more impact.
    Interesting that this all comes at a time when Sprint is trying to get their WiMax service off the ground. WiMax service is being pitched as an alternative to DSL and cable, with the ability to reach places where wires don’t reach as a bonus. We’ve already seen Comcast get in trouble for restricting end user broadband access. Will Sprint be similarly reprimanded?
    I’m sure Sprint’s management weighed all of these issues and impacts though when they decided to de-peer Cogent.

  16. James says:

    Opening a ticket is not going to solve this problem. Sending Vint Cerf back up on Capitol Hill with a realistic scenario of what would happen to the economy if this practice spreads, and legislative language to severely fine any tier-1 or tier-2 who de-peers because of asymmetric bandwidth without at least 90 days notice to all affected customers and peers will go a long way towards a solution.
    Or, perhaps the Canadians have more of a solution than a problem. Allow failover tier-1-to-tier-1 transit?
    This is damage. Let the routing around begin!

  17. http://fb.sp

    Due articoli (in Inglese) sul partizionamento di Internet di cui in Italia si è parlato pochissimo, forse perché da noi gli effetti sono ancora poco visibili: Renesys: Wrestling With the Zombie: Sprint Depeers Cogent, Internet Partitioned (pessimista)

  18. Link says:

    Thing is, most people expect things to “just work”, not caring why.
    So most people blame Sprint because Sprint disconnected Cogent, not caring that Cogent was violating traffic ratios.
    Cogent violating traffic ratios = consumers don’t care to even know about it.
    Sprint disconnecting Cogent = Consumers blame Sprint for “starting it”.
    Cogent is trying to take advantage of ignorant consumers.

  19. Elfin Magic says:

    I work at a law firm that is on Cogent. The U.S. federal court system (uscourts.gov) is apparently on Sprint. We cannot file any pleadings and we are not receiving any filing notices via email. Not sure how we’re going to handle this issue, but it clearly needs to be fixed soon.
    As for me, it’s probably the final nail in the coffin for my Sprint-based smartphone.

  20. The dispute has apparently been resolved. Cogent started seeing Sprint routes again at exactly 21:00:45 UTC, Sunday November 2. Moments later at 21:01:01 UTC, Sprint started seeing Cogent routes. A more detailed analysis will follow in a separate blog.

  21. Here is Sprint’s version of the current situation.
    I view this mostly as a depeering failure by Sprint. “Temporary reconnection” is not how you play the depeering game. Someone should send Overland Park a memo or offer them some a more detailed “Peering/Depeering playbook”.
    Maybe they’re relying on the court case?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>