Wednesday, August 30, 2006

How Gnutella Works

At its peak, Napster was perhaps the most popular Web site ever created. In less than a year, it went from zero to 60 million visitors per month. Then it was shut down by a court order because of copyright violations, and wouldn't relaunch until 2003 as a legal music-download site.
The original Napster became so popular so quickly because it offered a unique product -- free music that you could obtain nearly effortlessly from a gigantic database. You no longer had to go to the music store to get music. You no longer had to pay for it. You no longer had to worry about cueing up a CD and finding a cassette to record it onto. And nearly every song in the universe was available.

Given that it was distributing an illegal product, the original Napster's key weakness lay in its architecture -- the way that the creators designed the system. When the courts decided that Napster was promoting copyright infringement, it was very easy for a court order to shut the site down.

The fact that Napster promoted copyright violations did not matter to its users. Most of them have turned to a new file-sharing architecture known as Gnutella. In this article, you will learn about the differences between Gnutella and the old Napster that allow Gnutella to survive today despite a hostile legal environment.

Napster's Architecture

On the Web as it is normally implemented, there are Web servers that hold information and process requests for that information (see How Web Servers Work for details). Web browsers allow individual users to connect to the servers and view the information. Big sites with lots of traffic may have to buy and support hundreds of machines to support all of the requests from users.
Napster pioneered the concept of peer-to-peer file sharing. With the old version of Napster (Napster relaunched itself in 2003 as a legal, pay-for-music site), individual people stored files that they wanted to share (typically MP3 music files) on their hard disks and shared them directly with other people. Users ran a piece of Napster software that made this sharing possible. Each user machine became a mini server.

If you logged into the old Napster to download a song, here's what happened:

You started the Napster software on your machine. Your machine became a small server able to make files available to other Napster users.
Your machine connected to Napster's central servers. It told the central servers which files were available on your machine. So the Napster central servers had a complete list of every shared song available on every hard disk connected to Napster at that time.
You typed in a query for a song. Let's say you were looking for the song "Roxanne" by The Police. Napster's central servers listed all of the machines storing that song.
You picked a version of the song from the list.
Your machine connected to the user's machine that had that song, and downloaded the song directly from that machine.
The creator of Napster had a couple of reasons for this approach:
Napster eventually grew to have billions of songs available. There is no way a central server could have had enough disk space to hold all the songs, or enough bandwidth to handle all the requests.
Napster was trying to take advantage of a loophole in copyright law that allows friends to share music with friends. The legal concept behind Napster was, "All of these people are sharing the songs on their hard disks with their friends." The courts did not agree with that logic, but it gave Napster enough time to prove the concept and grow to massive size.
This approach worked great and made fantastic use of the Internet's architecture. By spreading the load for file downloading across millions of machines, Napster accomplished what would have been impossible any other way.
The central database for song titles was Napster's Achilles' heel. When the court ordered Napster to stop the music, the absence of a central database killed the entire original Napster network.

With the original Napster gone, what you had at that point was something like 100 million people around the world hungry to share more and more files. It was only a matter of time before another system came along to fill the gap.

Gnutella's Architecture

Currently, the most popular system for sharing files is another peer-to-peer network called Gnutella, or the Gnutella network. There are two main similarities between Gnutella and the old Napster:
Users place the files they want to share on their hard disks and make them available to everyone else for downloading in peer-to-peer fashion.
Users run a piece of Gnutella software to connect to the Gnutella network.
There are also two big differences between Gnutella and the old Napster:
There is no central database that knows all of the files available on the Gnutella network. Instead, all of the machines on the network tell each other about available files using a distributed query approach.
There are many different client applications available to access the Gnutella network.
Because of both of these features, it would be difficult for a simple court order to shut Gnutella down. The court would have to find a way to block all Gnutella network traffic at the ISP and the backbone levels of the Internet to stop people from sharing.

XoloX Example: Searching

XoloX is a typical, fairly simple program for connecting into the Gnutella network. It does not have some of the bells and whistles of the more sophisticated clients, but it does work, it is a small file to download (only 600 kilobytes or so), it has no "spyware" or bundled pop-up advertising mixed in with it, and it is very easy to install and use. Its simplicity makes it useful to demonstrate how a typical Gnutella client works.

There are three big things you can do with XoloX: search for files, transfer files to your machine and look at your downloaded files. There are three buttons at the top of the XoloX window that let you toggle between these three activities.

The figure above shows a typical screenshot during a search. All you do is type in the name (or keywords) of the file you are looking for. You can also select the file type: audio, video, etc., or "All Types." Your XoloX client sends out a message containing your search string, and over the course of 30 to 60 seconds a search window fills with results from the thousands of other machines that are processing your query.

One thing you will notice in the search window is a score. The score represents the number of machines currently online that have the same file available. By choosing a file with a high score, you increase your odds of actually getting the file you want.

Is Gnutella Legal?

Gnutella itself is legal. There is no law against sharing public domain files. It's when people use Gnutella to distribute copyrighted music and films that its use becomes illegal. This is the problem that got Napster in trouble. The music industry is officially upset about Gnutella, but there is currently no easy way to control it.
Attacking the Gnutella architecture is one way to disrupt file-sharing activities. There are currently two approaches being used:

Overloading the Gnutella network with a flood of bogus search packets.
Filling Gnutella servers with corrupted files.
Gnutella's many developers have adapted to problems in the past, so it is probable that new software can work around these threats and keep the files flowing.
The debate at the moment is how much financial damage file-sharing actually causes. Is a shared file a theft, or is it a form of free advertising and exposure just like airtime on the radio is?