ABSTRACT

Good morning. Today I’m going to talk about network science. My goal in the light of the presentations we have today is to offer a rather different perspective: that is, to argue that many of the things we see in the social environment are rooted in some fundamental laws that not only social systems obey, but are obeyed by a wide array of networks. Social systems are one of the most powerful examples of networks because we understand and relate to them in an everyday fashion. In a social network the nodes are the individuals and the links correspond to relationships-who is talking to whom, who is communicating with whom on a regular basis. What I would like to do today is to examine how we think about such networks. Let’s assume that you’ve been given the full set of relationships in a social network website such as Facebook. How would you analyze the data of such density and richness? If we think about these types of networks in mathematical terms, we have to go back to mathematicians Pál Erdo˝s and Alfréd Rényi and the question they asked about how to model a real network. As mathematicians, they thought of networks in fundamentally simple terms: nodes and links. But the challenge for these mathematicians was that they didn’t know how-in nature or society-nodes decided to link together. So Erdo˝s and Rényi made the assumption that links are assigned randomly, which means that any two nodes had a certain probability of being connected, making the network a fundamentally random object. Since 1960, mathematicians have invested a huge amount of work in understanding these random networks. As an illustration, if we start with a probability of p = 0, which means that the probability that any node is connected to another node is zero, and add new nodes while increasing the probability of a connection by adding links to the networks, clusters will start to emerge. If we continue to add more links to the system, at a certain moment these clusters will start joining each other. This is when the network actually emerges. So there is this “magical” moment that mathematically takes us from lots of

disconnected clusters to the emergence of what mathematicians call a “giant component.” When networks emerge through this process, it is very sudden. So, we find ourselves with two questions. First, is this representation of how a network emerges correct? And second, what does it mean? Let’s first address the “What does it mean?” question. One of the premises of a random network is that if you count how many links each node has, which we call the “degree of distribution” of the network, you will find a Poisson distribution. This means that if Facebook was a random network, you would find that most individuals have approximately the same number of friends, and that there are only very few individuals who have a very large number of friends or have no friends whatsoever. In fact, when it comes to their circle of friends, most individuals would be similar to each other. In a sense, the random network describes a society that is fundamentally very democratic: everyone

has roughly the same number of friends, and it’s very difficult to find individuals that are significantly richer or significantly poorer in the terms of their social ties than the average person. So, despite the randomness by which the links are placed, the randomness gets averaged out, and in the end we all become very similar to each other. Now, we need to question whether this is correct. Do we honestly believe that real networks-society, the Internet, or other systems-are truly random, decided by chance? No one would question that there is a large degree of randomness in the way we make friends and in the way certain things are connected. But is that all, or is there more to it? To answer this question, about a decade ago we started to collect large data sets, large maps of networks, with the idea that we needed to examine real networks to understand how they actually worked. Our first choice was the World Wide Web, a large network where nodes and documents were linked using URLs. It wasn’t a philosophical decision, it was simply available data that we could actually map out. We started in 1999 from the main page of University of Notre Dame and followed the links. Then we followed the links on the pages we reached. It was a terribly boring process, so we built a software to do this-these days, it is called a search engine. But unlike Google, who runs similar search engines, we didn’t care about the content of the pages. We only cared about the links and what they were actually connected to. So at the end of the day, this robot returned a map in which each node corresponds to a Web page and the links tell you the connection to another page that can be made with a single click. What was our expectation? Well, Web pages are created by individuals who significantly differ from one another. Some people care about social systems. Others care about the Red Sox or the White Sox, and still others care about Picasso. And what people put on Web pages reflect these personal interests. Given the huge differences between us, it’s reasonable to expect that a very large network would have a certain degree of randomness. And we expected that when we counted how many links each Web page had, the network would follow Poisson distribution, as predicted by the random network model. Surprisingly, however, our results showed something different. We found a large number of very small nodes with only a few links each, and a few very highly connected nodes. We found what we call a “power law distribution.” That is, P(k) ~ k-γ where P(k) is the probability that a node has k links and is called the “degree exponent.” What is a power law distribution? A power law distribution appears on a regular plot as a continuously and gradually decreasing curve. Whereas a Poisson distribution has an exponentially decaying tail, one that drops off very sharply, a power law distribution has a much slower decay rate resulting in a

long tail. This means that not only are there numerous small nodes, but that these numerous small nodes coexist with a few very highly connected nodes, or hubs. To illustrate, a random network would look similar to the highway system of the United States, where the cities are the nodes and the links are the highways connecting them. Obviously, it doesn’t make sense to build a hundred highways going into a city, and each major city in the mainland U.S. is connected by a highway. So if you were to draw a histogram of the number of major highways that meet in major cities, you would find the average to be around two or three. You wouldn’t find any city that would have a very large number of highways going in or out. In comparison, a map of airline routes shows many tiny airports and a few major hubs that have many flights going in and out; these hubs hold the whole network together. The difference between these two types of networks is the existence of these hubs. The hubs fundamentally change the way the network looks and behaves. These differences become more evident when we think about travel from the east coast to west coast. If you go on the highway system, you need to travel through many major cities. When you fly, you fly to Chicago and from Chicago you can reach just about any other major airport in the U.S. The way you navigate an airline network is fundamentally different from the way you navigate the highway system, and it’s because of the hubs. So we saw that the Web happens to be like the airline system. The hubs are obvious-Google, Yahoo, and other websites everybody knows-and the small nodes are our own personal Web pages. So the Web happens to be this funny animal dominated by hubs, what we call a “scale-free network.” When I say “scale-free network,” all I mean is that the network has a power law distribution; for all practical purposes you can visualize a network as dominated by a few hubs. So we asked, is the structure of the Web unique, or are there other networks that have similar properties? Take for example the map of the Internet. Despite the fact that in many people’s minds the Internet and Web are used interchangeably, the Internet is very different from the Web because it is a physical network. On the Web, it doesn’t cost any more money to connect with somebody who is next door than it does to connect to China. But with the Internet, placing a cable between here and China is quite an expensive proposition. On the Internet the nodes correspond to routers and the links correspond to physical cables. Yet, if one inspects any map of the Internet, we see a couple of major hubs that hold together many, many small nodes. These hubs are huge routers. Actually, the biggest hub in the United States is in the Midwest, in a well-guarded underground facility. We’ll see why in a moment. Thus, like the Web, the Internet is also a hub-dominated structure. I want to empha-

size that the Web and the Internet are very different animals. Yet, when you look at their underlying structures, and particularly if you mathematically analyze them, you will find that they are both scale-free networks. Let’s take another example. I’m sure everybody here is familiar with the Kevin Bacon game, where the goal is to connect an actor to Kevin Bacon. Actors are connected if they appeared in a movie together. So Tom Cruise has a Kevin Bacon number one because they appeared together in A Few Good Men. Mike Myers never appeared with Kevin Bacon-but he appeared with Robert Wagner in The Spy Who Shagged Me, and Robert Wagner appeared with Kevin Bacon in Wild Things. So he’s two links away. Even historical figures like Charlie Chaplin or Marilyn Monroe are connected by two to three links to Bacon. There is a network behind Hollywood, and you can analyze the historical data from all the movies ever made from 1890 to today to study its structure. Once again, if you do that, you will find exactly the same power law distribution as we saw earlier. Most actors have only a few links to other actors but there are a few major hubs that hold the whole network together. You may not know the names of the actors with few links because you walked out of the movie theater before their name came up on the screen. On the other hand there are the hubs, the actors you go to the movie theater to see. Their names are on the ads and feature prominently on the posters. Let’s move to the subject of this conference, online communities. Here, the nodes are the members. And though we don’t know who they are, their friends do, and these relationships with friends are the links. There are many ways to look at these relationships. One early study from 2002 examined email traffic in a university environment, and sure enough, a scale-free network emerged there as well. Another studied a pre-cursor to Facebook, a social networking site in Sweden, and exactly the same kind of distribution arose there. No matter what measure they looked at, whether people just poked each other, traded email, or had a relationship, the same picture emerged: most people had only few links and a few had a large number. But all the examples I have given you so far came from human-made systems, which may suggest that the scale-free property is rooted in something we do. We built the Internet, the Web, we do social networking, we do email. So perhaps these hubs emerge as something intrinsic in human behavior. Is it so? Let’s talk about what’s inside us. One of the many components in humans is genes, and the role of the genes is to generate proteins. Much of the dirty work in our cells is done not by the genes, but by the proteins. And proteins almost never work alone. They always interact with one another in what is known as protein-protein interaction. For example, if you look in your blood stream, oxygen is carried by hemoglobin. Hemoglobin essentially is a molecule

made of four proteins that attach together and carry oxygen. The proteins are nodes in a protein-protein interaction network, which is crucial to how the cell actually works. When it’s down, it brings on disease. There’s also a metabolic network inside us, which takes the food that you eat and breaks it down into the components that the cells can consume. It’s a network of chemical reactions. So the point is that there are many networks in our cells. On the left-hand side of this figure is the metabolic network of the simple yeast organism. On the right-hand side is the protein-protein interaction network. In both cases, if you analyze them mathematically you will observe a scale-free network; visually you can see the hubs very clearly.