Modularity and community structure in networks

M Newman

Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109

*E-mail: ude.hcimu@njem

Edited by Brian Skyrms, University of California, Irvine, CA, and approved April 19, 2006

Author contributions: M.E.J.N. designed research, performed research, analyzed data, and wrote the paper.

Edited by Brian Skyrms, University of California, Irvine, CA, and approved April 19, 2006

Received 2006 Feb 26

Abstract

Many networks of interest in the sciences, including social networks, computer networks, and metabolic and regulatory networks, are found to divide naturally into communities or modules. The problem of detecting and characterizing this community structure is one of the outstanding issues in the study of networked systems. One highly effective approach is the optimization of the quality function known as “modularity” over the possible divisions of a network. Here I show that the modularity can be expressed in terms of the eigenvectors of a characteristic matrix for the network, which I call the modularity matrix, and that this expression leads to a spectral algorithm for community detection that returns results of demonstrably higher quality than competing methods in shorter running times. I illustrate the method with applications to several published network data sets.

Keywords: clustering, partitioning, modules, metabolic network, social network

Abstract

Many systems of scientific interest can be represented as networks, sets of nodes or vertices joined in pairs by lines or edges. Examples include the internet and the worldwide web, metabolic networks, food webs, neural networks, communication and distribution networks, and social networks. The study of networked systems has a history stretching back several centuries, but it has experienced a particular surge of interest in the last decade, especially in the mathematical sciences, partly as a result of the increasing availability of accurate large-scale data describing the topology of networks in the real world. Statistical analyses of these data have revealed some unexpected structural features, such as high network transitivity (1), power-law degree distributions (2), and the existence of repeated local motifs (3); see refs. 4 –6 for reviews.

One issue that has received a considerable amount of attention is the detection and characterization of community structure in networks (7, 8), meaning the appearance of densely connected groups of vertices, with only sparser connections between groups (Fig. 1). The ability to detect such groups could be of significant practical importance. For instance, groups within the worldwide web might correspond to sets of web pages on related topics (9); groups within social networks might correspond to social units or communities (10). Merely the finding that a network contains tightly knit groups at all can convey useful information: if a metabolic network were divided into such groups, for instance, it could provide evidence for a modular view of the network's dynamics, with different groups of nodes performing different functions with some degree of independence (11, 12).

An external file that holds a picture, illustration, etc.
Object name is zpq0220623880001.jpg

Fig. 1.

The vertices in many networks fall naturally into groups or communities, sets of vertices (shaded) within which there are many edges, with only a smaller number of edges between vertices of different groups.

Past work on methods for discovering groups in networks divides into two principal lines of research, both with long histories. The first, which goes by the name of graph partitioning, has been pursued particularly in computer science and related fields, with applications in parallel computing and integrated circuit design, among other areas (13, 14). The second, identified by names such as block modeling, hierarchical clustering, or community structure detection, has been pursued by sociologists and more recently by physicists, biologists, and applied mathematicians, with applications especially to social and biological networks (7, 15, 16).

It is tempting to suggest that these two lines of research are really addressing the same question, albeit by somewhat different means. There are, however, important differences between the goals of the two camps that make quite different technical approaches desirable. A typical problem in graph partitioning is the division of a set of tasks between the processors of a parallel computer so as to minimize the necessary amount of interprocessor communication. In such an application the number of processors is usually known in advance and at least an approximate figure for the number of tasks that each processor can handle. Thus we know the number and size of the groups into which the network is to be split. Also, the goal is usually to find the best division of the network regardless of whether a good division even exists; there is little point in an algorithm or method that fails to divide the network in some cases.

Community structure detection, by contrast, is perhaps best thought of as a data analysis technique used to shed light on the structure of large-scale network data sets, such as social networks, internet and web data, or biochemical networks. Community structure methods normally assume that the network of interest divides naturally into subgroups and the experimenter's job is to find those groups. The number and size of the groups are thus determined by the network itself and not by the experimenter. Moreover, community structure methods may explicitly admit the possibility that no good division of the network exists, an outcome that is itself considered to be of interest for the light it sheds on the topology of the network.

This article focuses on community structure detection in network data sets representing real-world systems of interest. However, both the similarities and differences between community structure methods and graph partitioning will motivate many of the developments that follow.

The networks are, in order, the karate club network of Zachary (23), the network of collaborations between early jazz musicians of Gleiser and Danon (27), a metabolic network for the nematode Caenorhabditis elegans (28), a network of e-mail contacts at a university (29), a trust network of mutual signing of cryptography keys (30), and a coauthorship network of scientists working in condensed matter physics (31). No modularity figure is given for the last network with the GN algorithm because the slow O(n^{) operation of the algorithm prevents its application to such large systems. GN, Glrvan and Newman (}10); CNM, Clauset et al. (26); DA, Duch and Arenas (19).

Acknowledgments

I thank Lada Adamic, Alex Arenas, and Valdis Krebs for graciously providing network data. This work was funded in part by National Science Foundation Grant DMS-0405348 and the James S. McDonnell Foundation.

Acknowledgments

Notes

Note.

After this article was submitted, I was informed of a recent conference presentation by White and Smyth (32) in which a result similar to Eq. 2 was derived and used as the basis for a modularity maximization algorithm quite different from the one presented here. I thank Christian Pich for bringing this presentation to my attention.

Notes

Footnotes

Conflict of interest statement: No conflicts declared.

This paper was submitted directly (Track II) to the PNAS office.

^{Adamic, L. A. & Glance, N., WWW-2005 Workshop on the Weblogging Ecosystem, May 10–14, 2005, Chiba, Japan.}

Footnotes

References

1. Watts D. J., Strogatz S. H. Nature. 1998;393:440–442.[PubMed]
2. Barabási A.-L., Albert R. Science. 1999;286:509–512.[PubMed]
3. Milo R., Shen-Orr S., Itzkovitz S., Kashtan N., Chklovskii D., Alon U. Science. 2002;298:824–827.[PubMed]
4. Albert R., Barabási A.-L. Rev. Mod. Phys. 2002;74:47–97.[PubMed]
5. Dorogovtsev S. N., Mendes J. F. F. Adv. Phys. 2002;51:1079–1187.[PubMed]
6. Newman M. E. J. SIAM Rev. 2003;45:167–256.[PubMed]
7. Newman M. E. J. Eur. Phys. J. B. 2004;38:321–330.[PubMed]
8. Danon L., Duch J., Diaz-Guilera A., Arenas A. J. Stat. Mech. 2005:P09008.[PubMed]
9. Flake G. W., Lawrence S. R., Giles C. L., Coetzee F. M. IEEE Computer. 2002;35:66–71.[PubMed]
10. Girvan M., Newman M. E. J. Proc. Natl. Acad. Sci. USA. 2002;99:7821–7826.
11. Holme P., Huss M., Jeong H. Bioinformatics. 2003;19:532–538.[PubMed]
12. Guimerà R., Amaral L. A. N. Nature. 2005;433:895–900.
13. Elsner U Graph Partitioning: A Survey. Chemnitz, Germany: Technische Universität Chemnitz; 1997. Technical Report 97-27. [PubMed][Google Scholar]
14. Fjällström P.-O. Linköping Electronic Articles in Computer and Information Science. 1998. Available at . Accessed May 10, 2006.[PubMed]
15. White H. C., Boorman S. A., Breiger R. L. Am. J. Sociol. 1976;81:730–779.[PubMed]
16. Wasserman S., Faust K Social Network Analysis. Cambridge, U.K.: Cambridge Univ. Press; 1994. [PubMed][Google Scholar]
17. Newman M. E. J., Girvan M. Phys. Rev. E. 2004;69:026113.[PubMed]
18. Newman M. E. J. Phys. Rev. E. 2004;69:066133.[PubMed]
19. Duch J., Arenas A. Phys. Rev. E. 2005;72:027104.[PubMed]
20. Chung F. R. K. Spectral Graph Theory, CBMS Regional Conference Series in Mathematics; Providence, RI: Am. Math. Soc.; 1997. no. 92. [PubMed]
21. Fiedler M. Czech. Math. J. 1973;23:298–305.[PubMed]
22. Pothen A., Simon H., Liou K.-P. SIAM J. Matrix Anal. Appl. 1990;11:430–452.[PubMed]
23. Zachary WW. J. Anthropol. Res. 1977;33:452–473.[PubMed][Google Scholar]
24. Radicchi F., Castellano C., Cecconi F., Loreto V., Parisi D. Proc. Natl. Acad. Sci. USA. 2004;101:2658–2663.
25. Kernighan BW., Lin S. Bell System Tech. J. 1970;49:291–307.[PubMed][Google Scholar]
26. Clauset A., Newman M. E. J., Moore C. Phys. Rev. E. 2004;70:066111.[PubMed]
27. Gleiser P., Danon L. Adv. Complex Systems. 2003;6:565–573.[PubMed]
28. Jeong H., Tombor B., Albert R., Oltvai ZN., Barabási A.-L. Nature. 2000;407:651–654.[PubMed][Google Scholar]
29. Guimerà R., Danon L., Díaz-Guilera A., Giralt F., Arenas A. Phys. Rev. E. 2003;68:065103.[PubMed]
30. Guardiola X., Guimerà R., Arenas A., Diaz-Guilera A., Streib D., Amaral L. A. N. e-Print Archive. 2002. .[PubMed]
31. Newman M. E. J. Proc. Natl. Acad. Sci. USA. 2001;98:404–409.
32. White S., Smyth P. In: Kargupta H., Srivastava J., Kamath C., Goodman A., editors. Proceedings of the 5th SIAM International Conference on Data Mining; Philadelphia: Society for Industrial and Applied Mathematics; 2005. pp. 274–285. [PubMed]