Understanding the Human Component of P2P File Sharing

January 29, 2015

Millions of people turn to BitTorrent each month, making it the world’s most popular peer-to-peer communication protocol. But how and why are they using it? A multi-university team of researchers investigated this question, revealing some unexpected findings and an opportunity to build a better BitTorrent.

Among the researchers was Northeastern University College of Computer and Information Science (CCIS) Assistant Professor David Choffnes, who first joined the project while a Northwestern University doctoral student. Choffnes was involved in data collection and analysis, but his most important contribution turned out to be Ono, the plugin he had created to improve BitTorrent’s response time and download speed.

“This software, with user consent, let us collect anonymous data,” Choffnes explains. “In this privacy-sensitive age, we took great care in protecting users’ anonymity. We tied our hands intentionally so that users’ privacy came first.”

Choffnes and other members of the research team at Northwestern and in Spain never identified users as individuals. They knew the size but not the content of downloaded files. They obtained only coarse geographic data—the country but not the specific location of the user. Still, this provided enough information to better understand BitTorrent users and their behavior.

“We used the size of the file downloaded to ballpark the content,” says Choffnes, noting that this allowed the researchers to determine whether people were choosing to download movies, music, television shows, or other media.

A key question the researchers answered is what type of content users typically download. In BitTorrent, where downloading files is free, one could expect a wide range of behavior, such as hoarding as much content as possible or focusing on certain types of files. “Most users were specialists,” he says. “They tended to focus on individual categories or small groups of content, and they were consistent in the categories they chose. People didn’t randomly choose content.”

The researchers were also able to identify differences in user behavior based on geography. They found a positive link between a country’s gross domestic product (GDP) and the type of content downloaded from BitTorrent. Choffnes notes, for example, that users in Spain and Lithuania often downloaded videos, while far fewer did so in the United States. He says, “In the United States, it’s reasonable to assume the content is available elsewhere. There are a lot of other places to access video content for free or for a reasonable price, such as Netflix or Hulu, that are much better suited to video than BitTorrent.”

Another discovery was the strong negative correlation between regard for intellectual property and the size of a download. Choffnes explains, “The more value there is for data copyright and ability to enforce it, the less likely a user is to download a file.”

The researchers were sometimes surprised by what they learned. Choffnes says, “We originally expected a correlation with broadband performance, but people downloaded files from BitTorrent independent of broadband speed.”

The full findings were recently published in Proceedings of the National Academy of Sciences of the United States. But, as Choffnes explains, their work has potential benefits that extend beyond the research community.

“BitTorrent was designed for general use, but now we know that most people use it in a specific way. We may be able to optimize it for a common case,” he says. “It wasn’t designed for streaming video, but if people download movies, we may be able to make a better user experience. We can look at how we can design a better BitTorrent.”