Tech

Deep packet inspection meets ‘Net neutrality, CALEA

Deep packet inspection provides the tools that ISPs need to throttle, cap, and …

Nate Anderson – Jul 26, 2007 3:10 AM | 1

Throttle me this: An introduction to DPI

Imagine a device that sits inline in a major ISP's network and can throttle P2P traffic at differing levels depending on the time of day. Imagine a device that allows one user access only to e-mail and the Web while allowing a higher-paying user to use VoIP and BitTorrent. Imagine a device that protects against distributed denial of service (DDoS) attacks, scans for viruses passing across the network, and siphons off requested traffic for law enforcement analysis. Imagine all of this being done in real time, for 900,000 simultaneous users, and you get a sense of the power of deep packet inspection (DPI) network appliances.

Although the technology isn't yet common knowledge among consumers, DPI already gives network neutrality backers nightmares and enables American ISPs to comply with CALEA (government-ordered Internet wiretaps) reporting requirements. It also just might save the Internet (depending on who you believe).

Ars recently had the chance to talk with executives from DPI vendors Ellacoya and Procera Networks about their offerings and how they are already being deployed around the world, and we got a look at the newest boxes on offer from each company. Their top-of-the-line products can set you back several hundred thousand dollars, but some of them can inspect and shape every single packet—in real time—for nearly a million simultaneous connections while handling 10-gigabit Ethernet speeds and above.

That's some serious horsepower, and when major ISPs deploy these products in their networks, they suddenly know a whole lot more about their users and their traffic. They also gain the ability to block, shape, monitor, and prioritize that traffic—in any direction. That makes it suddenly simple to, say, prioritize all incoming traffic from any web site that has handed over a briefcase stuffed with unmarked bills while leaving every other site to fight its way through the tubes as best it can.

It also becomes trivial to start blocking or actively degrading services that a company dislikes—like VoIP, for example. Not that this would ever happen. But that's not how the technology is marketed, and there's little evidence that it's currently being used this way. DPI is generally sold on the premise that network operators can control entire classes of traffic (P2P, VoIP, e-mail, etc.) on a group or per-user basis. Let's take a look at how that happens and what it means for both network neutrality and legal interception (CALEA) compliance.

Inspecting packets, deeply

The "deep" in deep packet inspection refers to the fact that these boxes don't simply look at the header information as packets pass through them. Rather, they move beyond the IP and TCP header information to look at the payload of the packet. The goal is to identify the applications being used on the network, but some of these devices can go much further; those from a company like Narus, for instance, can look inside all traffic from a specific IP address, pick out the HTTP traffic, then drill even further down to capture only traffic headed to and from Gmail, and can even reassemble e-mails as they are typed out by the user.

But this sort of thing goes beyond the general uses of DPI, which is much more commonly used for monitoring and traffic shaping. Before an ISP can shape traffic, it must know what's passing through its system. Without DPI, that simple-sounding job can be all but impossible. "Shallow" packet inspection might provide information on the origination and destination IP addresses of a particular packet, and it can see what port the packet is directed towards, but this is of limited use.

Shallow inspection doesn't help much with modern applications, especially with those designed to get through home and corporate firewalls with a minimum of trouble. Such programs, including many P2P applications and less-controversial apps like Skype, can use many different ports; some can even tunnel their traffic through entirely different protocols.

So looking at the port doesn't give ISPs enough information anymore, and looking just at the IP address can't identify P2P traffic, for instance. Even for applications like web browsers that consistently use port 80, more information is needed. How much of that HTTP traffic is video? Ellacoya, which recently completed a study of broadband usage, says that 20 percent of all web traffic is really just YouTube video streams.

This is information an ISP wants to know; at peak hours, traffic shaping hardware might downgrade the priority of all streaming video content from YouTube, giving other web requests and e-mails a higher priority without making YouTube inaccessible.

OSI layer model

This only works if the packet inspection is "deep." In terms of the OSI layer model, this means looking at information from layers 4 through 7, drilling down as necessary until the nature of the packet can be determined. For many packets, this requires a full layer 7 analysis, opening up the payload and attempting to determine which application generated it (DPI gear is generally built as a layer 2 device that is transparent to the rest of the network).

Procera explains the need for this approach in marketing materials, saying that "layer 7 identification is a necessity today when most client software, like P2P file sharing, is customizable to communicate over any given port to avoid traditional port-based firewalls and traffic management systems."

But how does this work? Data packets don't often contain metadata saying that they were generated from eDonkey; the DPI appliances need to figure out this out. In real-time. For hundreds of thousands of simultaneous connections.

Peeking beneath the 7th layer

Layer 7 is the application layer, the actual messages sent across the Internet by programs like Firefox or Skype or Azureus. By stripping off the headers, deep packet inspection devices can use the resulting payload to identify the program or service being used. Procera, for instance, claims to detect more than 300 application protocol signatures, including BitTorrent, HTTP, FTP, SMTP, and SSH. Ellacoya reps tell Ars that their boxes can look deeper than the protocol, identifying particular HTTP traffic generated by YouTube and Flickr, for instance. Of course, the identification of these protocols can be used to generate traffic shaping rules or restrictions.

Much like virus scanners, the boxes generally make use of "application signatures"—telltale ways of sending and receiving information that can be used to link a particular packet with a particular application. Procera's version is called Datastream Recognition Definition Language, and just like virus signatures, DPI gear needs regular updates to stay on top of new developments.

DPI vendor Allot Communications has produced a nice whitepaper that describes the different forms that this signature analysis can take. Port analysis is the simplest way to identify an application, but as we've already mentioned, it's notoriously inaccurate. Adding string matches can help, but not all applications use identifiable strings of characters. Kazaa does so, however, embedding its own name in the "user-agent" field of HTTP GET requests. Searching packets for the string "Kazaa" can turn up these requests and let the ISP know that a particular user currently has the application running. Numerical properties are another good way to craft application signatures, using patterns like payload length or specific response sequences.

Looking this closely into packets can raise privacy concerns: can DPI equipment peek inside all of these packets and assemble them into a legible record of your e-mails, web browsing, VoIP calls, and passwords? Well, yes, it can. In fact, that's exactly what companies like Narus use the technology to do, and they make a living out of selling such gear to the Saudi Arabian government, among many others.

Texas disaster recovery and managed services company Data Foundry objects to network operators doing this deep level of inspection. In a recent FCC filing, the company charged that "broadband providers' AUP/TOS/Privacy Policies, in combination with Deep Packet Inspection, allow intrusive monitoring of the content and information customers transmit or receive. This contractual and technical capability interferes with and may well eliminate all sorts of privileges presently recognized under law... Broadband service providers have no justifiable reason to capture this information."

But vendors like Ellacoya and Procera aren't so interested in capturing private data, and it's not the focus of their devices. An Ellacoya rep reassures me that most applications can be identified without actually looking through all the data in a packet payload. Still, concern over the technology has been growing as its rollout has accelerated.

DPI can also be used to root out viruses passing through the network. While it won't cleanse affected machines, it can stop packets that contain proscribed byte sequences. It can also identify floods of information characteristic of denial of service attacks and can then apply rules to those packets.

Some of these things can be done by looking at a single packet, but many cannot. DPI gear can generally extract information from traffic that varies by application type: IP addresses and URLs from HTTP traffic, SIP numbers from VoIP calls, filenames of P2P files, and chat channels for instant messages. Grabbing this information requires a look at a whole set of initial packets until the necessary information is gained, referred to as examining the "flow." Procera in particular makes a big deal about this, referring to their technology as "deep flow inspection" rather than deep packet inspection.

Nickel-and-diming?

All of this technology can be applied in a highly granular fashion. Surveillance rules can be created that are specific to each individual subscriber, and traffic shaping and quality of service can also be applied differently to every connection in the network. Without this sort of individual shaping technology, it has generally been easiest for ISPs to simply offer subscribers unfettered access to the Internet. Bandwidth caps are simple to implement without using DPI, but DPI does make it simple to tier levels of service—purchasing access to the web, but not to VoIP for instance. Based on the capabilities I've been describing, this sort of thing can go even further, with companies marketing low-cost data plans that might include web access except for streaming video or VoIP calls but no online gaming.

Such scenarios aren't a fantasy; they're happening right now. In the US, Internet access is still generally sold as all-you-can-eat, with few restrictions on the types of services or applications that can be run across the network (except for wireless, of course), but things are different across the pond. In the UK, ISP plus.net doesn't even offer "unlimited" packages, and they explain why on their web site.

"Most providers claiming to offer unlimited broadband will have a fair use policy to try and prevent people over-using their service," they write. "But if it's supposed to be unlimited, why should you use it fairly? The fair use policy stops you using your unlimited broadband in an unlimited fashion—so, by our reckoning, it’s not unlimited. We don't believe in selling 'unlimited broadband' that's bound by a fair use policy. We'd rather be upfront with you and give you clear usage allowances, with FREE overnight usage."

Plus.net's plans

What that means in this is that you pay by the gigabyte and by the service. Plans start at £9.99 (around $20) a month for just 1GB of data, though use after 10 PM appears not to count for this quota. The lowest price tier also does not support gaming and places severe speed controls on FTP and P2P use (allowing only 50Kbps at peak periods). Plus.net says that the lowest tier will not work adequately with online games or corporate VPNs. Paying £29.99 (around $60) a month provides 40GB of data transfer and fast P2P and FTP speeds, along with 240 VoIP minutes from the company. All of these tiers feature downloads speeds of up to 8Mbps.

How do they do it? With Ellacoya gear.

This can sound like nickel-and-diming, creating new ways to charge people for things (online gaming) that used to be free. But plus.net and Ellacoya both argue that it's actually a better deal for consumers because it lowers the price for those who need fewer features. According to this argument, users who don't want to play online games or download massive P2P files should not have to pay a share of the bandwidth for those who do. Traffic shaping can be used to set up a whole host of data packages to provide increased customization and, ultimately, lower costs for lighter users. Heavy users might actually see their fees increase as they're no longer subsidized by others on the network.

In fact, modern DPI gear can allow each individual subscriber to select services and speeds that are of most benefit to them, and every single user on the network can have a different set of rules in place (and pay a different price). Ellacoya's new marketing buzzword for this capability is "the Personal Internet."

Now, if all this talk of throttling and service restrictions hasn't yet cause you to think the words "network neutrality," you haven't been paying attention, because this is exactly the sort of talk that some people find offensive. "The 'Net was built on open access and non-discrimination of packets!" they argue, to which DPI vendors say, "ISPs must prepare for the exaflood."

Net neutrality, traffic shaping, and the "coming exaflood"

Let me put my cards on the table: I loathe the word "exaflood." It sounds like the sort of concept that would surface in a bad science fiction novel, one involving a sentient artificial intelligence, aliens who speak only in clicks, and a hard-boiled ex-space Marine with a shotgun. I'm not going to use it again, but if you're not familiar with the term (it's generally used not in any technical sense, but simply to mean "a whole lot of data"), check out this Wall Street Journal article or this freely-available reprint.

The idea here, from the perspective of the DPI vendors, is that the Internet now generates and streams more data than the current transmission network can handle without shaping or throttling. Senator Ted Stevens (R-AK) may have been widely ridiculed for his "series of tubes" analogy, but Internet connections are like tubes—each link can only transmit so much data at once (though "Internet tubes" can gain capacity over time, as fiber optic lines, DSL links, and cable lines have all done; this is part of Isenberg's point about why it's just cheaper to boost capacity). Given the voracious appetite of P2P users and streaming video watchers, this sort of content alone could cause delays for content that is arguably more critical and time-sensitive for an ISP's customers than an illicit Hollywood release or a video of a kid wiping out on a dirt bike: e-mail, instant messages, traditional web browsing.

Seen in these terms, the DPI vendors argue that ISPs which "do nothing" to shape traffic on their networks have actually made a choice. In this case, the choice is in favor of chaos and bottlenecks at peak periods. No matter how much bandwidth is currently thrown at the problem, P2P, Usenet, FTP, and streaming video will fill it (Ellacoya's CEO told me that "throwing bandwidth at the problem can't solve it"). Handling this ~~exaflood~~ data surge responsibly means using traffic shaping, at least during the periods of highest use.

This argument fits together nicely with another common one that I heard from DPI vendors: we help to make networks "fair." This was one of the claims made by plus.net (see the previous page); why should it be fair for a few ultraheavy users of the network to drag down performance for everyone else? Traffic shaping gear is all designed to integrate easily with billing systems, making it easy to charge more money for heavier use. The corollary is that prices for more modest users should actually go down (whether that actually happens is another story).

Concerns over managed traffic

Now, this entire approach to managing traffic doesn't sit well with some folks who call for neutrality on their networks. Recent research has shown that a nondiscriminatory network will in fact require up to twice the peak bandwidth of a tiered and shaped network, but this doesn't necessarily mean that this is the more expensive approach. Pundits like David Isenberg have argued that simple overprovisioning is cheaper in the long run than investing in all the new DPI gear and the manpower to maintain and monitor it.

The debate is made complicated by the fact that "network neutrality" has a hundred differing definitions, making it something of a hundred-headed hydra. In the Journal article linked above, the author talks repeatedly about net neutrality as something that will force network providers to lease out access to competitors at government-dictated rates. Whatever else this idea might be, it's not what most people talk about when they refer to "net neutrality."

For a thoughtful definition, consider the one given by Daniel Weitzner, who cofounded the Center for Democracy & Technology, teaches at MIT, and works for the W3C. He lays out four points that neutral networks should adhere to:

Non-discriminatory routing of packets
User control and choice over service levels
Ability to create and use new services and protocols without prior approval of network operators
Nondiscriminatory peering of backbone networks.

Savetheinternet.com has spearheaded the network neutrality drive in Congress, and it has a shorter definition available: "Put simply, Net Neutrality means no discrimination. Net Neutrality prevents Internet providers from speeding up or slowing down Web content based on its source, ownership, or destination."

If that's not clear enough, they provide an example. "When we log onto the Internet, we take a lot for granted. We assume we'll be able to access any Web site we want, whenever we want, at the fastest speed, whether it's a corporate or mom-and-pop site. We assume that we can use any service we like—watching online video, listening to podcasts, sending instant messages—anytime we choose."

It's not hard to see why these particular constructions of "openness" run headlong into the business plans of the traffic-shapers. Companies like Ellacoya and Procera argue that this sort of "never discrimate" policy isn't much more than unworkable idealism. Such a network will in fact fill up with data; companies that don't filter or shape packet flows have then made a default decision to allow things like VoIP, videoconferencing, and online gaming to get "laggy" and e-mail to get delayed as BitTorrent and YouTube packets clog the tubes. Downloading an 800MB video, even if the movie in question is legal, is hardly the sort of application that is mission critical, and few customers are going to abandon ship because their YouTube videos take an extra two seconds to buffer. But customers do care if their VoIP service consistently goes glitchy or has tremendous lag, if World of Warcraft becomes unplayable, or critical e-mails and IMs are delayed in transit.

The argument of the vendors is generally that "the market will decide" and that what's important is for companies simply to be upfront about the kinds of restrictions they have in place. We agree that transparency in these matters is a good idea, but the basic problem in the US is that if you don't like the policies your ISP has in place, it can be difficult to switch. We've been pointing out for years that Americans are generally locked into one or two providers, so most people are hardly spoiled for choice.

Where you come down on these questions may vary depending on where DPI gear is deployed; many people have less problems with its use by last-mile ISPs who interact directly with consumers. Throttling P2P traffic to keep the network open for other uses might be fine, but the concern is magnified when such gear is rolled out by the backbone operators, like AT&T and Verizon. With last-mile ISPs, at least (most) customers have some options for switching if they don't like the terms.

But there are so few backbone operators, and they wield so much power, that the truly scary stuff from a net neutrality perspective is if backbone providers start looking at Google and say, "If you want decent transport over my pipes, then you have to pay my toll." When that type of demand comes from an upstream provider, from a network economics standpoint that's a whole different ball game than Comcast trying to soak Google by threatening to slow down access to Google.com.

That's because there's no way for the end users to vote "no" on the policy; all of the users of the multiple last-mile ISPs who are downstream from that backbone will see their access to Google start to suck, but there's not much they can do about it because it's not really their ISP's fault. In other words, the backbone providers have a more insular, more monopolistic, non-consumer-facing position in the Internet hierarchy, so if they decide to ditch neutrality and start squeezing websites and online service providers, then there's not much that can be done.

These are deep waters, and there are complex arguments to be made here (for a detailed engineering discussion of the issues facing "best effort" routing on a congested network, take a look at this IETF Internet-Draft by Sally Floyd and Mark Allman). DPI gear makes plenty of objectionable behaviors possible, but it also opens the door to network virus scans and DDoS defense mechanisms that could do real good. By making it possible to purchase access only to the specific services or protocols that one needs, DPI could also make the Internet cheaper for casual web and e-mail users. Like most technologies, the gear itself enables a great range of uses, and it's up to the operator to be responsible.

In fact, the Center for Democracy & Technology, which stands up for freedom of expression and privacy on the Internet, has no problem with many of DPI's projected uses. In its FCC comments regarding network neutrality, the group laid out a host of possible practices along with its thoughts on them (pp. 7-10). Blocking security threats, spam, and illegal content is unobjectionable to the CDT, as is prioritizing any content requested by the subscriber and prioritizing traffic based solely on the type of application (like VoIP). But blocking any traffic or actively degrading it would be off limits, as would priority given to traffic from specific ISPs or web site operators who have paid an additional fee.

Snooping for the feds: CALEA compliance

That's doubly true when it comes to doing user surveillance, since DPI gear makes it simple to collect and offload any user's entire datastream. ISPs are required to possess this capability under the Communications Assistance to Law Enforcement Act (CALEA), which started life as an update to traditional wiretapping laws. It has now been extended to VoIP operators and ISPs, who need a way to grab, archive, and submit to law enforcement any wiretap information requested in a warrant.

Much DPI gear is also CALEA-compliant. The boxes generally contain an "aux" port that can spit out a real-time copy of any required information: all traffic from a specific IP address, e-mail, Internet phone calls, URLs. The rules are simply programmed into the box's GUI and bam!—instant surveillance.

Full CALEA compliance can be a lot of work. It involves having someone available at all times to respond to any warrants that come in, someone who can set up and implement the correct rules, and more gear that can take the data and format it according to federal specifications, then make it available to the government. Many network operators don't want anything to do with this, so they simply install the DPI gear that makes it possible and contract out all the support and data formatting issues to another company, referred to as a "trusted third party" (TTP).

These TTPs handle all the grunt work; if given permission, they can even add the necessary surveillance rules to the DPI box remotely. Data from the user in question then flows from the ISP network to the TTP network, where it is passed along to the Feds. For this sort of logging to be most effective, DPI equipment needs to be installed near the edge of the network or as part of a gateway in order to ensure that both incoming and outgoing communications can be logged. It's extremely common for traffic between two places on the Internet to flow over different paths in each direction, so a box placed incorrectly can't observe both sides of the conversation, which is often necessary to really know what's going on.

Real-time monitoring is great, but what happens when you need to investigate a crime after it's happened? Plenty of information can also be logged to disk so that it can be accessed after the fact and used in these kinds of investigations. Storage needs to be thought out carefully, though; logging unfiltered traffic from a single gigabit Ethernet link can generate up to 10 terabytes a day, in each direction.

Procera touts the story of LP Broadband, a small Colorado ISP that serves rural customers. LP Broadband was using a PL7600 DPI box with an optional statistics server, which logs far more traffic details than routine monitoring software. When an LP customer found that a business server had been compromised by hackers one night, they went to the authorities and obtained a court order that directed LP to turn over relevant records from the event.

The company was able to isolate the hacker's IP address and identified the time and duration of the hacking session; if the customer wanted, the Procera gear could simply block all further access from that particular IP address.

Coming soon to an ISP near you

DPI gear can be expensive, especially the kind that can simultaneously monitor hundreds of thousands of connections. But bandwidth isn't cheap, either, and disgruntled customers equal lost revenue. Blocking viruses, DDoS attacks, and hacking traffic on a network can also save bandwidth, user frustration, and tech support time. Both Ellacoya and Procera claim that their products pay for themselves within nine months (Ellacoya) or three to twelve months (Procera).

The rise of "lawful intercept" (CALEA) requirements and the growth of online video (both P2P and over HTTP) are making monitoring and shaping increasingly important to ISPs. Because of the firestorm surrounding network neutrality in the US, ISPs here tend to take a cautious approach to using this equipment, but it's far more common overseas.

BT, for instance, recent became Ellacoya's single largest customer, using its gear to support more than 3 million broadband subscribers. According to BT, deep packet inspection enables them to better monitor their network, but it also allows them to apply QoS to two important services. VoIP, to be useful, needs to move quickly, so BT gives it priority on the network. BT also runs its own IPTV system, with the data apparently flowing over the same network as user data. To prevent distortion in the TV signal whenever half the country decides to download an episode of Little Britain using P2P, BT uses QoS to make sure a fixed amount of bandwidth is always available to IPTV.

As services like voice and TV continue their migration onto IP networks, DPI gear will only grow in importance. Is that a bad thing? It certainly doesn't have to be, but the time to debate the proper limits of shaping, blocking, and spying is now, before they become ubiquitous features of the ISP landscape.

Nate Anderson Deputy Editor

Nate is the deputy editor at Ars Technica. His most recent book is In Emergency, Break Glass: What Nietzsche Can Teach Us About Joyful Living in a Tech-Saturated World, which is much funnier than it sounds.

1 Comments