Additional Sydney Capacity Added

We’re pleased to announce that we’ve added additional local peering capacity in Sydney.  The new peers give us greatly improved access to both Telstra and Optus, and the visitor experience from these networks within Australia should be greatly improved.

 

 

(Really Really) Understanding traceroute

One of the great many frustrations I hear frequently from people in the network community is in dealing with customers who practice “traceroute engineering” when opening trouble cases. At CacheFly we’re extremely lucky in that we almost never have customers who do this, but the problem certainly exists. Worse yet, 9 times out of 10 people make fundamentally incorrect assumptions, inferences and conclusions based on something as simple as a traceroute.

Enter Richard Steenbergen. One of the truly gifted people out there when it comes to running a network, Richard is the CTO and founder of nLayer. A few years ago Richard made a fantastic presentation at NANOG regarding how to *really* read a traceroute, and bunch of other smart people walked a way with a much better understanding of how traceroute actually works. Richard has since converted his presentation into an article which is very easy to digest and yet provides all the info from his original presentation. I’ve included a couple of interesting excerpts below, but for a proper education, be sure to read the full PDF.  It’s a total of 20 pages including some graphics, and well worth your time if you ever look at a traceroute for work or pleasure.

 

Queuing Delay

To understand queuing delays, first you must understand the nature of interface utilization. For
example, a 1GE port may be said to be “50% utilized” when doing 500Mbps, but what this actually
means is 50% utilized over some period of time (for example, over 1 second). At any given instant,
an interface can only be either transmitting (100% utilized), or not transmitting (0% utilized).
When a packet is routed to an interface but that interface is currently in use, the packet must be
queued. Queuing is a natural function of all routers, and normally contributes very little latency to
the overall forwarding process. But as an interface approaches the point of saturation, the
percentage of packets which must be queued for significant periods of time increases exponentially.
The amount of queuing delay that can be caused by a congested interface depends on the type of
router. A large carrier-class router typically has a significant amount of packet buffering, and can add
many hundreds or thousands of milliseconds of latency when routing over a congested interface. In
comparison, many enterprise-class devices (such as Layer 3 switches) typically have very small
buffers, and may simply drop packets when congested, without ever causing a significant increase inlatency.


Matt: Once you’ve read this once or twice, it makes all the sense in the world.. But did you ever really look at a gige port doing 512kbps and think of it as operating at 100% utilization?

Asymmetric Routing

One of the most basic concepts of routing on the Internet is that there is absolutely no guarantee of
symmetrical routing of traffic flowing between the same end-points but in opposite directions. Regular
IP forwarding is done by destination-based routing lookups, and each router can potentially have its own
idea about where traffic should be forwarded.
As we discussed earlier, Traceroute is only capable of showing you the forward path between the source
and destination you are trying to probe, even though latency incurred on the reverse path of the ICMP
TTL Exceed packets is part of the round-trip time calculation process. This means that you must also
examine the reverse path Traceroute before you can be certain that a particular link is responsible for
any latency values you observe in a forward Traceroute.
Asymmetric paths most often start at network boundaries, because this is where administrative policies
are most likely to change. For example, consider the following Traceroute:

3 te1-1.ar2.DCA3.gblx.net (69.31.31.209) 0.719 ms 0.560 ms 0.428 ms
4 te1-2-10g.ar3.DCA3.gblx.net (67.17.108.146) 0.574 ms 0.557 ms 0.576 ms
5 sl-st21-ash-8-0-0.sprintlink.net (144.232.18.65) 100.280 ms 100.265 ms 100.282 ms
6 144.232.20.149 (144.232.20.149) 102.037 ms 101.876 ms 101.892 ms
7 sl-bb20-dc-15-0-0.sprintlink.net (144.232.15.0) 101.888 ms 101.876 ms 101.890 ms

 

This Traceroute shows a 100ms increase in latency between Global Crossing in Ashburn VA and Sprint in
Ashburn VA, and you’re trying to figure out why. Obviously distance isn’t the cause for the increased
latency, since these devices are both in the same city. It could be congestion between Global Crossing
and Sprint, but this isn’t guaranteed. After the packets cross the boundary between Global Crossing and
Sprint, the administrative policy is also likely to change. In this specific example, the reverse path from
Sprint to the original Traceroute source travels via a different network, which happens to have a
congested link. Someone looking at only the forward Traceroute would never know this though, which is
why obtaining both forward and reverse Traceroutes is so important to proper troubleshooting.

 

Matt: These are just the tip of the iceberg.. In the full document, Richard dives into MPLS tunnels, ECMP, Serialization/Queuing delays and much more.. 

 

The full article is located here: http://cluepon.net/ras/traceroute.pdf

Stockholm POP is back up

The Stockholm POP is back up after a hiatus of about a month to re-home to new infrastructure.  Latency in Scandinavia has been greatly improved!

 

 

Hong Kong POP Live

We’re pleased to announce that we’re now live in Hong Kong!   Our POP is now live in NTT’s facility, and we’re fully peered at HKIX.

New Features on the horizon..

In addition to the launch of the new website (finally!  I actually had a target date of February 2006 for the relaunch..), we’re rolling out some requested features very soon..

On top of the recently added control panel items (manage hostnames, manage object expiry) we’re planning to add the following very soon (in the next 2 weeks):

  • Country based blocking
  • per-account MIME-type overrides

Additionally there’s quite a few new features on the horizon in the next 3-4 months – Check back here for updates…I’ll be trying to post weekly with development updates.

CacheFly Outperforms the Competition

CloudHarmony announced their results of their comprehensive end-user performance testing this week, and (as we expected :) ), CacheFly came out on top!

You can see the methodology and results here, or you can conduct your own speed test.

Sydney POP now live

Our second POP in Australia (the other being Perth) is now live and delivering content, this will greatly improve performance for users in eastern Australia.

Newark POP is now Live

We're now serving bits from Equinix Newark..This should improve performance (slightly) for east coast users, but mostly it adds redundancy to our Ashburn Location.

LAX POP Live

As of yesterday, our Equinix LA1 POP is now live and delivering traffic.

Storage Pricing Dropped

Effective immediately, we’ve dropped pricing for excess storage down to $15/GB for all web plans.  This applies to both existing customers and new accounts.

As always, if you’re looking for more aggressive storage pricing, please contact CacheFly Sales.