Resolution limit in community detection

Santo Fortunato

Marc Barthélemy

^{School of Informatics and Center for Biocomplexity, Indiana University, Bloomington, IN 47406;}

^{Fakultät für Physik, Universität Bielefeld, D-33501 Bielefeld, Germany;}

^{Complex Networks Lagrange Laboratory (CNLL), ISI Foundation, 10133 Torino, Italy; and}

^{Commissariat à l'Energie Atomique–Département de Physique Théorique et Appliquée, 91680 Bruyeres-Le-Chatel, France}

^{To whom correspondence should be addressed. E-mail:}rf.aec@ymelehtrab.cram

Edited by David O. Siegmund, Stanford University, Stanford, CA, and approved November 6, 2006

Author contributions: S.F. and M.B. designed research, performed research, analyzed data, and wrote the paper.

Edited by David O. Siegmund, Stanford University, Stanford, CA, and approved November 6, 2006

Received 2006 Jul 17

Abstract

Detecting community structure is fundamental for uncovering the links between structure and function in complex networks and for practical applications in many disciplines such as biology and sociology. A popular method now widely used relies on the optimization of a quantity called modularity, which is a quality index for a partition of a network into communities. We find that modularity optimization may fail to identify modules smaller than a scale which depends on the total size of the network and on the degree of interconnectedness of the modules, even in cases where modules are unambiguously defined. This finding is confirmed through several examples, both in artificial and in real social, biological, and technological networks, where we show that modularity optimization indeed does not resolve a large number of modules. A check of the modules obtained through modularity optimization is thus necessary, and we provide here key elements for the assessment of the reliability of this community detection method.

Keywords: complex networks, modular structure, metabolic networks, social networks

Abstract

Community detection in complex networks has attracted a lot of attention in recent years (for a review, see refs. 1 and 2). The main reason is that complex networks (3 –7) are made of a large number of nodes and most previous quantitative investigations focused on statistical properties disregarding the roles played by specific subgraphs. Detecting communities (or modules) can be a way to identify substructures which could correspond to important functions. This is, for example, confirmed in the case of the World Wide Web, where communities are sets of Web pages dealing with the same topic (8). In biological networks, it is widely believed that the modular structure results from evolutionary constraints and plays a crucial role in biological functions (9 –11), which makes community detection very relevant (12 –14). Relevant community structures were also found in social networks (15 –17), the Internet (18), food webs (19, 20), and in networks of sexual contacts (21, 22).

Loosely speaking, a community is a subgraph of a network whose nodes are more tightly connected with each other than with nodes outside the subgraph. A decisive advance in community detection was made by Newman and Girvan (23), who introduced a quantitative measure for the quality of a partition of a network into communities, the modularity. This measure essentially compares the number of links inside a given module with the expected value for a randomized graph of the same size and same degree sequence. If one chooses modularity as the relevant quality function, the problem of community detection becomes equivalent to modularity optimization. The latter is not trivial, as the number of possible partitions of a network into clusters increases at least exponentially with the size of the network, making exhaustive optimization computationally unfeasible even for relatively small graphs. Therefore, a number of algorithms have been devised to find a good optimization technique with the smallest computational cost possible. The fastest available procedures use greedy techniques (24, 25) and extremal optimization (26), which are, at the present time, the only algorithms capable of detecting communities in large networks. More accurate results are obtained through simulated annealing (27, 28), but this method is computationally very expensive.

Modularity optimization seems, therefore, to be a very effective method to detect communities, both in real and in artificially generated networks. However, modularity itself has not yet been thoroughly investigated, and only a few general properties are known. For example, it is known that the modularity value of a partition does not have a meaning by itself, but only when compared with the corresponding modularity expected for a random graph of the same size (29), as the latter may attain very high values due to fluctuations (27).

In this article, we present a critical analysis of modularity and of the applicability of modularity optimization to the problem of community detection. We show that modularity contains an intrinsic scale that depends on the total number of links in the network. Modules that are smaller than this scale may not be resolved, even in the extreme case where they are complete graphs connected by single bridges. The resolution limit of modularity actually depends on the degree of interconnectedness between pairs of communities and can reach values of the order of the size of the whole network. Tests performed on several artificial and real networks clearly show that this problem is likely to occur.

It is thus a priori impossible to tell whether a module (large or small), detected through modularity optimization, is indeed a single module or a cluster of smaller modules. This raises doubts about the effectiveness of modularity optimization in community detection, and more generally about the applicability of quality functions.

In the second column, we report the number of modules detected in the partition obtained for the maximal modularity. However, these modules contain submodules; in the third column we report the total number of submodules we found and the corresponding value of the modularity of the partition, which is lower than the peak modularity initially found.

Acknowledgments

We thank A. Barrat, C. Castellano, V. Colizza, E. Flach, A. Flammini, J. Kertész, F. Menczer, and A. Vespignani for enlightening discussions and suggestions, and U. Alon for providing the network data.

Acknowledgments

Footnotes

The authors declare no conflict of interest.

This article is a PNAS direct submission.

Footnotes

References

1. Newman MEJ. Eur Phys J B. 2004;38:321–330.[PubMed]
2. Danon L, Díaz-Guilera A, Duch J, Arenas A. J Stat Mech. 2005 P09008. [PubMed]
3. Barabási A-L, Albert R. Rev Mod Phys. 2002;74:47–97.[PubMed]
4. Dorogovtsev SN, Mendes JFF Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford: Oxford Univ Press; 2003. [PubMed][Google Scholar]
5. Newman MEJ. SIAM Rev. 2003;45:167–256.[PubMed]
6. Pastor-Satorras R, Vespignani A Evolution and Structure of the Internet: A Statistical Physics Approach. Cambridge, UK: Cambridge Univ Press; 2004. [PubMed][Google Scholar]
7. Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D-U. Phys Rep. 2006;424:175–308.[PubMed]
8. Flake GW, Lawrence S, Lee Giles C, Coetzee FM. IEEE Computer. 2002;35:66–71.[PubMed]
9. Hartwell LH, Hopfield JJ, Leibler S, Murray AW. Nature. 1999;499:C47–C52.[PubMed]
10. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási A-L. Science. 2002;297:1551–1555.[PubMed]
11. Papin JA, Reed JL, Palsson BO. Trends Biochem Sci. 2004;29:641–647.[PubMed]
12. Holme P, Huss M, Jeong H. Bioinformatics. 2003;19:532–538.[PubMed]
13. Guimerà R, Amaral LAN. Nature. 2005;433:895–900.
14. Palla G, Derényi I, Farkas I, Vicsek T. Nature. 2005;435:814–818.[PubMed]
15. Girvan M, Newman MEJ. Proc Natl Acad Sci USA. 2002;99:7821–7826.
16. Lusseau D, Newman MEJ. Proc R Soc London B. 2004;271:S477–S481.
17. Adamic L, Glance N Proc 3rd Int Workshop on Link Discovery. Los Angeles: Information Sciences Inst, Univ of Southern California; 2005. pp. 36–43. [PubMed][Google Scholar]
18. Eriksen K, Simonsen I, Maslov S, Sneppen K. Phys Rev Lett. 2003;90:148701.[PubMed]
19. Pimm SL. Theor Popul Biol. 1979;16:144–158.[PubMed]
20. Krause A-E, Frank KA, Mason DM, Ulanowicz RE, Taylor WW. Nature. 2003;426:282–285.[PubMed]
21. Garnett GP, Hughes JP, Anderson RM, Stoner BP, Aral SO, Whittington WL, Handsfield HH, Holmes KK. Sexually Transmitted Diseases. 1996;23:248–257.[PubMed]
22. Aral SO, Hughes JP, Stoner BP, Whittington WL, Handsfield HH, Anderson RM, Holmes KK. Am J Public Health. 1999;89:825–833.
23. Newman MEJ, Girvan M. Phys Rev E. 2004;69 026113. [PubMed]
24. Newman MEJ. Phys Rev E. 2004;69 066133. [PubMed]
25. Clauset A, Newman MEJ, Moore C. Phys Rev E. 2004;70 066111. [[PubMed]
26. Duch J, Arenas A. Phys Rev E. 2005;72 027104. [[PubMed]
27. Guimerà R, Sales-Pardo M, Amaral LAN. Phys Rev E. 2004;70 025101(R) [PubMed]
28. Reichardt J, Bornholdt S. Phys Rev E. 2006;74 016110. [[PubMed]
29. Reichardt J, Bornholdt S. 2006. arXiv:cond-mat/0606220.
30. Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Proc Natl Acad Sci USA. 2004;101:2658–2663.
31. Muff S, Rao F, Caflisch A. Phys Rev E. 2005;72 056107. [[PubMed]
32. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Science. 2002;298:824–827.[PubMed]
33. Danon L, Díaz-Guilera A, Arenas A. 2006. arXiv:physics/0601144.