The structure of scientific collaboration networks
Abstract
The structure of scientific collaboration networks is investigated. Two scientists are considered connected if they have authored a paper together and explicit networks of such connections are constructed by using data drawn from a number of databases, including MEDLINE (biomedical research), the Los Alamos e-Print Archive (physics), and NCSTRL (computer science). I show that these collaboration networks form “small worlds,” in which randomly chosen pairs of scientists are typically separated by only a short path of intermediate acquaintances. I further give results for mean and distribution of numbers of collaborators of authors, demonstrate the presence of clustering in the networks, and highlight a number of apparent differences in the patterns of collaboration between the fields studied.
A social network is a collection of people, each of whom is acquainted with some subset of the others. Such a network can be represented as a set of points (or vertices) denoting people, joined in pairs by lines (or edges) denoting acquaintance. One could, in principle, construct the social network for a company or firm, for a school or university, or for any other community up to and including the entire world.
Social networks have been the subject of both empirical and theoretical study in the social sciences for at least 50 years (1–3), partly because of inherent interest in the patterns of human interaction, but also because their structure has important implications for the spread of information and disease. It is clear, for example, that variation in just the average number of acquaintances that individuals have (also called the average degree of the network) might substantially influence the propagation of a rumor, a fashion, a joke, or this year's flu.
A famous early empirical study of the structure of social networks, conducted by Stanley Milgram (4), asked test subjects, chosen at random from a Nebraska telephone directory, to get a letter to a target subject in Boston, a stockbroker friend of Milgram's. The instructions were that the letters were to be sent to their addressee (the stockbroker) by passing them from person to person, but that they could be passed only to someone whom the passer knew on a first-name basis. Because it was not likely that the initial recipients of the letters were on a first-name basis with a Boston stockbroker, their best strategy was to pass their letter to someone whom they felt was nearer to the stockbroker in some sense, either social or geographical: perhaps someone they knew in the financial industry, or a friend in Massachusetts.
A moderate number of Milgram's letters did eventually reach their destination, and Milgram discovered that the average number of steps taken to get them there was only about six, a result that has since passed into folklore and was immortalized by John Guare in the title of his 1990 play, Six Degrees of Separation (5). Although there were certainly biases present in Milgram's experiment—letters that took a longer path were perhaps more likely to get lost or forgotten, for instance (6)—his result is usually taken as evidence of the “small-world hypothesis,” that most pairs of people in a population can be connected by only a short chain of intermediate acquaintances, even when the size of the population is very large.
Milgram's work, although cleverly conducted and in many ways revealing, does not, however, tell us much about the detailed structure of social networks, data that are crucial to the understanding of information or disease propagation. Many other studies have addressed this problem (discussions can be found in refs. 1–3). Foster et al. (7), Fararo and Sunshine (8), and Moody and White (9), for instance, all conducted studies of friendship networks among middle- or high-school students, Bernard et al. (10) did the same for communities of Utah Mormans, Native Americans, and Micronesian islanders, and there are many other examples to be found in the literature. Surveys or interviews were used to determine friendships.
Although these studies directly probe the structure of the relevant social network, they suffer from two substantial shortcomings that limit their usefulness. First, the studies are labor intensive, and the size of the network that can be mapped is therefore limited—typically to a few tens or hundreds of people. Second, these studies are highly sensitive to subjective bias on the part of interviewees; what is considered to be an “acquaintance” can differ considerably from one person to another. To avoid these issues, a number of researchers have studied networks for which there exist more numerous data and more precise definitions of connectedness. Examples of such networks are the electric power grid (3, 11), the Internet (12, 13), and the pattern of air traffic between airports (14). These networks, however, suffer from a different problem: although they may loosely be said to be social networks in the sense that their structure in some way reflects features of the society that built them, they do not directly measure actual contact between people. Many researchers, of course, are interested in these networks for their own sake, but to the extent that we want to know about human acquaintance patterns, power grids and computer networks are a poor proxy for the real thing.
Perhaps the nearest that studies of this kind have come to looking at a true acquaintance network is in studies of the network of movie actors (11, 14). In this network, which has been thoroughly documented and contains nearly half a million people, two actors are considered connected if they have been credited with appearance in the same film. However, although this is genuinely a network of people, it is far from clear that the appearance of two actors in the same movie implies that they are acquainted in any but the most cursory fashion, or that their acquaintance extends off screen. To draw conclusions about patterns of everyday human interaction from the movies would, it seems certain, be a mistake.
In this paper, I present a study of a genuine network of human acquaintances that is large—containing over a million people—and for which a precise definition of acquaintance is possible. That network is the network of scientific collaboration, as documented in the papers scientists write.
Numbers in parentheses are standard errors on the least significant figures.
Acknowledgments
I am indebted to Paul Ginsparg and Geoffrey West (Los Alamos e-Print Archive), Carl Lagoze (NCSTRL), Oleg Khovayko, David Lipman and Grigoriy Starchenko (MEDLINE), and Heath O'Connell (SPIRES), for making available the publication data used for this study. I also thank Dave Alderson, Paul Ginsparg, Laura Landweber, Ronald Rousseau, Steve Strogatz, and Duncan Watts for illuminating conversations. This work was funded in part by a grant from Intel Corporation to the Santa Fe Institute Network Dynamics Program. The NCSTRL digital library was made available through the Defense Advanced Research Planning Agency (DARPA)/Corporation for National Research Initiatives test suites program funded under DARPA Grant N66001–98-1–8908. The Los Alamos e-Print archive is funded by the National Science Foundation under Grant PHY-9413208.
Footnotes
Article published online before print: Proc. Natl. Acad. Sci. USA, 10.1073/pnas.021544898.
Article and publication date are at www.pnas.org/cgi/doi/10.1073/pnas.021544898
References
- 1. Wasserman S, Faust K Social Network Analysis. Cambridge: Cambridge Univ. Press; 1994. [PubMed][Google Scholar]
- 2. Scott J Social Network Analysis. London: Sage Publications; 2000. [PubMed][Google Scholar]
- 3. Watts D J Small Worlds. Princeton, NJ: Princeton Univ. Press; 1999. [PubMed][Google Scholar]
- 4. Milgram S. Psychol Today. 1967;2:60–67.[PubMed]
- 5. Guare J Six Degrees of Separation. New York: Vintage; 1990. [PubMed][Google Scholar]
- 6. White H C. Social Forces. 1970;49:259–264.[PubMed]
- 7. Foster C C, Rapoport A, Orwant C J. Behav Sci. 1963;8:56–65.[PubMed]
- 8. Fararo T J, Sunshine M A Study of a Biased Friendship Network. Syracuse, NY: Syracuse Univ. Press; 1964. [PubMed][Google Scholar]
- 9. Moody J, White D R Social Cohesion and Embeddedness: A Hierarchical Conception of Social Groups. Santa Fe Institute working paper 00–07-49; 2000. [PubMed][Google Scholar]
- 10. Bernard H R, Kilworth P D, Evans M J, McCarty C, Selley G A. Ethnology. 1988;2:155–179.[PubMed]
- 11. Watts D J, Strogatz S H. Nature (London) 1998;393:440–442.[PubMed]
- 12. Albert R, Jeong H, Barabási A-L. Nature (London) 1999;401:130–131.[PubMed]
- 13. Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J. Computer Networks. 2000;33:309–320.[PubMed]
- 14. Amaral L A N, Scala A, Barthélémy M, Stanley H. Proc Natl Acad Sci USA. 2000;97:11149–11152. . (First Published September 26, 2000; 10.1073/pnas.200327197)
- 15. de Solla Price D. Science. 1965;149:510–515.[PubMed]
- 16. Egghe L, Rousseau R Introduction to Informetrics. Amsterdam: Elsevier; 1990. [PubMed][Google Scholar]
- 17. Melin G, Persson O. Scientometrics. 1996;36:363–377.[PubMed]
- 18. Kretschmer H. Z Sozialpsychol. 1998;29:307–324.[PubMed]
- 19. Ding Y, Foo S, Chowdhury G. Int Inform Lib Rev. 1999;30:367–376.[PubMed]
- 20. Crane D Invisible Colleges. Chicago: Univ. of Chicago Press; 1972. [PubMed][Google Scholar]
- 21. van Raan A F J. Science. 1990;347:626.[PubMed]
- 22. Persson O, Beckmann M. Scientometrics. 1995;33:351–366.[PubMed]
- 23. Hoffman P The Man Who Loved Only Numbers. New York: Hyperion; 1998. [PubMed][Google Scholar]
- 24. Grossman J W, Ion P D F. Congressus Numerantium. 1995;108:129–131.[PubMed]
- 25. Barabási A L, Albert R. Science. 1999;286:509–512.[PubMed]
- 26. Lotka A J. J Wash Acad Sci. 1926;16:317–323.[PubMed]
- 27. O'Connell H B Physicists Thriving with Paperless Publishing. physics/0007040; 2000. [PubMed][Google Scholar]
- 28. Stauffer D, Aharony A Introduction to Percolation Theory. 2nd Ed. London: Taylor and Francis; 1991. [PubMed][Google Scholar]
- 29. Bollobás B Random Graphs. New York: Academic; 1985. [PubMed][Google Scholar]
- 30. Newman M E J, Strogatz S H, Watts D J Random Graphs with Arbitrary Degree Distribution and Their Applications. 2000. , preprint, cond-mat/0007235. [PubMed][Google Scholar]


