## 实例介绍

【实例简介】

复杂网络 算法 newman快速算法 n2

FAST ALGORITIIM FOR DETECTING COMMUNITY PITYSICAL REVIEW E 69, 066133(2004) of the communities found by our algo rithm in the "karate club" network of Zachary [5, 17]. The shapes of the vertices represent the two into which the club split alt of an inte 16177511122021822144138310231916152193133292526322427303428 mally, however, the gn algorithm seems to have the edge,"conferences, with intraconference games being more fre and this should come as no great surprise. Our algorithm quent than interconference games, we have a reasonable idea bases its decisions on purely local information about indi- ahead of time about what communities our algorithm should vidual communities, while the gn algorithm uses nonlocal find. The dendrogram generated by the algorithm is shown in information about the entire nctwork- -information derived Fig 3, and has an optimal modularity of 2=0.546, which is from betweenness scores. Since community structure is itself a little shy of the value 0.601 for the best split reported in fundamentally a nonlocal quantity, it seems reasonable that [5]. As the dendrogram reveals, the algorithm finds six com one can do a better job of finding that structure if one has munities. Some of them correspond to single conferences nonlocal information at one's disposal but most correspond to two or more. The gn algorithm, by For systems small enough that the gn algorithm is com- contrast, finds all ll conferences, as well as accurately iden putationally tractable, therefore, we see no reason not to con- tifying independent teams that belong to no conference tinue using it-it appears to give the best results. For systems Nonetheless, it is clear that our algorithm is quite capable of too large to make use of this approach, however, our algo- picking out useful community structure from the network, rithm gives useful community structure information with and of course it is much the faster algorithm On the authors comparatively little effort desktop computer the algorithm ran to completion in an im- We have applied our algorithm to a variety of real-world measurably small time less than a hundredth of a second club"network studied in [5], which represents friendships secon Orithm of Girvan and Newman took a little over a between 34 members of a club at a u.s. university as re A time difference of this magnitude will not present a big corded over a two-year period by Zachary [17]. During the problem in most practical situations, but performance rapidly course of the study, the club split into two groups as a result becomes an issue when we look at larger networks; we ex- of a dispute within the organization, and the members of one pect the ratio of running times to increase with the number of group left to start their own club. In Fig. 2 we show the vertices. Thus, for example, in applying our algorithm to the dendrogram derived by feeding the friendship network into 1275-node network of jazz musician collaborations men- our algorithm. The peak modularity is 0=0.381 and corre- tioned above, we found that it runs to completion in about sponds to a split into two groups of 17, as shown in the one second of CPU time. The gn algorithm by contrast figure. The shapes of the vertices represent the alignments of takes more than three hours to reach very similar results the club members following the dispute and, as we can see As an example of an analysis made possible by the speed the division found by the algorithm corresponds almost per of our algorithm, we have looked at a network of collabora- fectly to these alignments; only one vertex, number 10, is tions between physicists as documented by papers posted on classified wrongly. The gn algorithm performs similarly on the widely used Physics E-print Archive at arxiv. org. The this task, but not better--it also finds the split but classifies network is an updated version of the one described in Ref. one vertex wrongly (although a different one, vertex 3). In [13, in which scientists are considered connected if they other tests, we find that our algorithm also successfully de- have coauthored one or more papers posted on the archiv tects the main two-way division of the dolphin social net- We analyze only the largest component of the network, work of Lusseau [6, 18, and the division between black and which contains n=56 276 scientists in all branches of phys- white musicians in the jazz network of Gleiser and danon ics covered by the archive Since two vertices that are un- connected by any path are never put in the same community As a demonstration of how our algorithm can sometimes by our algorithm, the small fraction of vertices that are not miss some of the structure in a network, we take another part of the largest component can safely be assumed to be in example from Ref [5], a network representing the schedule separate communities in the sense of our algorithm. Our al of games between American college football teams in a gorithm takes 42 min to find the full community structure single season. Because the teams are divided into groups or Our best estimates indicate that the gn algorithm would take 066133-3 ME. NEWMAN PITYSICAL REVIEWE 69, 066133(2004) O Atlantic coast ● Big east V Big 10 Big 12 □ Conference USa Independent ◇ Mid americ ◆ Mountain West (〉 Pacific10 ●SEC ▲ Western athletic VAV 中◆▲A▲ 秤 班单 FIG. 3. Dendrogram of the communities found in the college football network descibed in the text. The real-world communities- conferencesare denoted by the different shapes as indicated in the legend somewhere between three and five years to complete its ver- correlation of this kind that makes community structure sion of the same calculation analysis a useful tool in understanding the behavior of nct- The analysis reveals that the network in question consists worked systems of about 600 communities, with a high peak modularity of We can repeat the analysis with any of the subcommuni- 2=0.713, indicating strong community structure in the phys- ties to observe how they break up For example, feeding the ics world. Four of the communities found are large, contain- smaller of the two condensed-matter groups through the al ing between them 77% of all the vertices, while the others gorithm again, we find an even stronger peak modularity of are small--see Fig. 4, left panel. The four large communities 0=0.807-the strongest we have yet observed in any correspond closely to subject subareas: one to astrophysics, network--corresponding to a split into about a 100 commu one to high-energy physics, and two to condensed-matter nities(Fig 4, center panel). These communities have a broad physics. Thus there appears to be a strong correlation be- distribution of sizes from 3 to nearly 2000. The distribution tween the structure found by our algorithm and the commu- is shown in cumulative form in Fig. 5, and we observe that it nity divisions perceived by human observers. It is precisely is approximately power law in form with exponent about Physics E-print Archive, 56276 vertices , mostly condensed matter, 9350 vertices subgroup, 134 vertices 11070 1744 1009 1005 93%C.M. 870 HE P 480 615 )○ 9350 98% astro 86%C.M single research group 28 vertices +600 smaller communities --- power-law distribution of group sizes FIG. 4. Left panel: Community structure in the collaboration network of physicists. The graph breaks down into four large groups, each composed primarily of physicists of one specialty, as shown. Specialties are determined by the subsection(s)of the e-print archive in which individuals post papers:"C M. indicates condensed matter;"IlE P. indicates high-energy physics including theory, phenomenology, and nuclear physics, "astro?"indicates astrophysics. Middle panel: one of the condensed matter communities is further broken down by the algorithm, revealing an approximate power-law distribution of community sizes. Right panel: one of these smaller communities is further analyzed to reveal individual research groups(different shades ), one of which(in the dashed box) is the author's own 066133-4 FAST ALGORITIIM FOR DETECTING COMMUNITY PITYSICAL REVIEW E 69, 066133(2004) suc this line of analysis further, identifying individual E≌ groups, iteratively breaking them down, and looking, for ex ample, at the patterns of collaboration between them, but we this for later stud IY CONCLUSIONS In this paper we have described an algorithm for extract ing community structure from networks, which has a consid erable speed advantage over previous algorithms, running to completion in a time that scales as the square of the network size. This allows us to study much larger systems than has 10 10 1000 previously been possible. Among other examples, we have applied the algorithm to a network of collaborations between size of community s more than 50 000 physicists, and found that the resulting FIG. 5. Cumulative distribution function of the sizes of commu community structure corresponds closely to the traditional ities found in one of the subnetworks of the physics collaboration divisions between specialties and research groups in the graph, as described in the text. T he dotted line represents the slope the plot would have if the distribution followed a power law with We believe that our method will not only allow for the exponent.6 extension of community structure analysis to some of the very large networks that are now being studied for the first 1.6, although this conclusion should be treated with caution time, but will also provide a useful tool for visualizing and there is significant deviation from a perfect power law understanding the structure of these networks, whose daunt- 20] ing size has hitherto made many of their structural properties Narrowing our focus still further to the particular onc of obscure. these communities that contains the present author, we find ACKNOWLEDGMENTS the structure shown in the right panel of Fig 4. Feeding this one last time through the algorithm, it breaks apart into com The author thanks leon danon pablo gleiser. David lus- munities that correspond closely to individual institutional seau, and Douglas White for providing network data used in research groups, the author's group appearing in the corner the examples. This work was supported in part by the na- of the figure, highlighted by the dashed box. One could pur- tional Science Foundation under grant No. DMS-0234188 [1]S. H. Strogatz, Nature(London)410, 268(2001) [14J. Kleinberg and S. Lawrence, Science 294, 1849(2001) [2]R. Albert and A.-L. Barabasi, Rev. Mod. Phys. 74, 47(2002). [15] B. Everitt, Cluster Analysis (John Wiley, New York, 1974) B]S N. Dorogovtsev and J.F. F Mendes, Evolution of Networks: [16]. Scott, Social Network Analysis: A Handbook, 2nd ed(Sage, he Internet and www(Oxford Uni London. 2000 versity Press, Oxford, 2003) [17]W.W. Zachary, J Anthropol. Res. 33, 452(1977) M.E. J. Newman, SIAM Rev. 45, 167(2003) [18D. Lusseau, Proc R. Soc. London, Ser. B 270, S186(2003) M. Girvan and M. E J. Newman, Proc. Natl. Acad. Sci. U.S.A. [19 Thc critcrion for deciding correct classification is as follows 99,7821(2002) We find the largest set of vertices that are grouped together by [6]M. E J. Newman and M. Girvan, Phys. Rev. E 69, 026113 the algorithm in each of the four known communities. If the (2004) algorithm puts two or more of these sets in the same group [7 D. Wilkinson and B. A. Huberman, Proc. Natl. Acad. Sci then all vertices in those sets are considered incorrectly classi U.S.A.101,5241(2004) ficd. Otherwise, they arc considered corrcctly classified. All [8 P. Holme, M. Huss, and H. Jeong, Bioinformatics 19, 532 other vertices not in the largest sets are considered incorrectly (2003) classified. This criterion is quite harsh--there are cases in 9R. Guimera, L Danon, A. Diaz-Guilera, F. Giralt, and A. Are which one might consider some of the vertices to have been ras,Phys.Rev.E68,065103(2003 dentified correctly. where this method would not. Even with [10JJ. R. Tyler, D. M. Wilkinson, and B. A. Huberman, in Pro- this harsh definition, howcvcr, our algorithm performs wcll eedingy ofthe First International Conference on Communities and a laxer definition would only make its performance more and Technologies, edited by M. Huysman, E. Wenger, and V. Impressive Wulf(Kluwer, Dordrecht, 2003) [20] This power law is different from the one observed in an emai [ll] P Gleiser and L Danon, Adv Complex Syst. 6, 565(2003) etwork by Guimera et al. [9]. They studied the histogram of [12]S. Rcdncr, Eur. Phys. J B 4, 131(1998 community sizcs ovcr all levcls of the dendrogram; wc arc [13 M. E. J. Newman, Proc. Natl. Acad. Sci. U.S.A. 98, 404 looking only at the single level corresponding to the maximum (2001) value of o 066133 【实例截图】

【核心代码】

复杂网络 算法 newman快速算法 n2

FAST ALGORITIIM FOR DETECTING COMMUNITY PITYSICAL REVIEW E 69, 066133(2004) of the communities found by our algo rithm in the "karate club" network of Zachary [5, 17]. The shapes of the vertices represent the two into which the club split alt of an inte 16177511122021822144138310231916152193133292526322427303428 mally, however, the gn algorithm seems to have the edge,"conferences, with intraconference games being more fre and this should come as no great surprise. Our algorithm quent than interconference games, we have a reasonable idea bases its decisions on purely local information about indi- ahead of time about what communities our algorithm should vidual communities, while the gn algorithm uses nonlocal find. The dendrogram generated by the algorithm is shown in information about the entire nctwork- -information derived Fig 3, and has an optimal modularity of 2=0.546, which is from betweenness scores. Since community structure is itself a little shy of the value 0.601 for the best split reported in fundamentally a nonlocal quantity, it seems reasonable that [5]. As the dendrogram reveals, the algorithm finds six com one can do a better job of finding that structure if one has munities. Some of them correspond to single conferences nonlocal information at one's disposal but most correspond to two or more. The gn algorithm, by For systems small enough that the gn algorithm is com- contrast, finds all ll conferences, as well as accurately iden putationally tractable, therefore, we see no reason not to con- tifying independent teams that belong to no conference tinue using it-it appears to give the best results. For systems Nonetheless, it is clear that our algorithm is quite capable of too large to make use of this approach, however, our algo- picking out useful community structure from the network, rithm gives useful community structure information with and of course it is much the faster algorithm On the authors comparatively little effort desktop computer the algorithm ran to completion in an im- We have applied our algorithm to a variety of real-world measurably small time less than a hundredth of a second club"network studied in [5], which represents friendships secon Orithm of Girvan and Newman took a little over a between 34 members of a club at a u.s. university as re A time difference of this magnitude will not present a big corded over a two-year period by Zachary [17]. During the problem in most practical situations, but performance rapidly course of the study, the club split into two groups as a result becomes an issue when we look at larger networks; we ex- of a dispute within the organization, and the members of one pect the ratio of running times to increase with the number of group left to start their own club. In Fig. 2 we show the vertices. Thus, for example, in applying our algorithm to the dendrogram derived by feeding the friendship network into 1275-node network of jazz musician collaborations men- our algorithm. The peak modularity is 0=0.381 and corre- tioned above, we found that it runs to completion in about sponds to a split into two groups of 17, as shown in the one second of CPU time. The gn algorithm by contrast figure. The shapes of the vertices represent the alignments of takes more than three hours to reach very similar results the club members following the dispute and, as we can see As an example of an analysis made possible by the speed the division found by the algorithm corresponds almost per of our algorithm, we have looked at a network of collabora- fectly to these alignments; only one vertex, number 10, is tions between physicists as documented by papers posted on classified wrongly. The gn algorithm performs similarly on the widely used Physics E-print Archive at arxiv. org. The this task, but not better--it also finds the split but classifies network is an updated version of the one described in Ref. one vertex wrongly (although a different one, vertex 3). In [13, in which scientists are considered connected if they other tests, we find that our algorithm also successfully de- have coauthored one or more papers posted on the archiv tects the main two-way division of the dolphin social net- We analyze only the largest component of the network, work of Lusseau [6, 18, and the division between black and which contains n=56 276 scientists in all branches of phys- white musicians in the jazz network of Gleiser and danon ics covered by the archive Since two vertices that are un- connected by any path are never put in the same community As a demonstration of how our algorithm can sometimes by our algorithm, the small fraction of vertices that are not miss some of the structure in a network, we take another part of the largest component can safely be assumed to be in example from Ref [5], a network representing the schedule separate communities in the sense of our algorithm. Our al of games between American college football teams in a gorithm takes 42 min to find the full community structure single season. Because the teams are divided into groups or Our best estimates indicate that the gn algorithm would take 066133-3 ME. NEWMAN PITYSICAL REVIEWE 69, 066133(2004) O Atlantic coast ● Big east V Big 10 Big 12 □ Conference USa Independent ◇ Mid americ ◆ Mountain West (〉 Pacific10 ●SEC ▲ Western athletic VAV 中◆▲A▲ 秤 班单 FIG. 3. Dendrogram of the communities found in the college football network descibed in the text. The real-world communities- conferencesare denoted by the different shapes as indicated in the legend somewhere between three and five years to complete its ver- correlation of this kind that makes community structure sion of the same calculation analysis a useful tool in understanding the behavior of nct- The analysis reveals that the network in question consists worked systems of about 600 communities, with a high peak modularity of We can repeat the analysis with any of the subcommuni- 2=0.713, indicating strong community structure in the phys- ties to observe how they break up For example, feeding the ics world. Four of the communities found are large, contain- smaller of the two condensed-matter groups through the al ing between them 77% of all the vertices, while the others gorithm again, we find an even stronger peak modularity of are small--see Fig. 4, left panel. The four large communities 0=0.807-the strongest we have yet observed in any correspond closely to subject subareas: one to astrophysics, network--corresponding to a split into about a 100 commu one to high-energy physics, and two to condensed-matter nities(Fig 4, center panel). These communities have a broad physics. Thus there appears to be a strong correlation be- distribution of sizes from 3 to nearly 2000. The distribution tween the structure found by our algorithm and the commu- is shown in cumulative form in Fig. 5, and we observe that it nity divisions perceived by human observers. It is precisely is approximately power law in form with exponent about Physics E-print Archive, 56276 vertices , mostly condensed matter, 9350 vertices subgroup, 134 vertices 11070 1744 1009 1005 93%C.M. 870 HE P 480 615 )○ 9350 98% astro 86%C.M single research group 28 vertices +600 smaller communities --- power-law distribution of group sizes FIG. 4. Left panel: Community structure in the collaboration network of physicists. The graph breaks down into four large groups, each composed primarily of physicists of one specialty, as shown. Specialties are determined by the subsection(s)of the e-print archive in which individuals post papers:"C M. indicates condensed matter;"IlE P. indicates high-energy physics including theory, phenomenology, and nuclear physics, "astro?"indicates astrophysics. Middle panel: one of the condensed matter communities is further broken down by the algorithm, revealing an approximate power-law distribution of community sizes. Right panel: one of these smaller communities is further analyzed to reveal individual research groups(different shades ), one of which(in the dashed box) is the author's own 066133-4 FAST ALGORITIIM FOR DETECTING COMMUNITY PITYSICAL REVIEW E 69, 066133(2004) suc this line of analysis further, identifying individual E≌ groups, iteratively breaking them down, and looking, for ex ample, at the patterns of collaboration between them, but we this for later stud IY CONCLUSIONS In this paper we have described an algorithm for extract ing community structure from networks, which has a consid erable speed advantage over previous algorithms, running to completion in a time that scales as the square of the network size. This allows us to study much larger systems than has 10 10 1000 previously been possible. Among other examples, we have applied the algorithm to a network of collaborations between size of community s more than 50 000 physicists, and found that the resulting FIG. 5. Cumulative distribution function of the sizes of commu community structure corresponds closely to the traditional ities found in one of the subnetworks of the physics collaboration divisions between specialties and research groups in the graph, as described in the text. T he dotted line represents the slope the plot would have if the distribution followed a power law with We believe that our method will not only allow for the exponent.6 extension of community structure analysis to some of the very large networks that are now being studied for the first 1.6, although this conclusion should be treated with caution time, but will also provide a useful tool for visualizing and there is significant deviation from a perfect power law understanding the structure of these networks, whose daunt- 20] ing size has hitherto made many of their structural properties Narrowing our focus still further to the particular onc of obscure. these communities that contains the present author, we find ACKNOWLEDGMENTS the structure shown in the right panel of Fig 4. Feeding this one last time through the algorithm, it breaks apart into com The author thanks leon danon pablo gleiser. David lus- munities that correspond closely to individual institutional seau, and Douglas White for providing network data used in research groups, the author's group appearing in the corner the examples. This work was supported in part by the na- of the figure, highlighted by the dashed box. One could pur- tional Science Foundation under grant No. DMS-0234188 [1]S. H. Strogatz, Nature(London)410, 268(2001) [14J. Kleinberg and S. Lawrence, Science 294, 1849(2001) [2]R. Albert and A.-L. Barabasi, Rev. Mod. Phys. 74, 47(2002). [15] B. Everitt, Cluster Analysis (John Wiley, New York, 1974) B]S N. Dorogovtsev and J.F. F Mendes, Evolution of Networks: [16]. Scott, Social Network Analysis: A Handbook, 2nd ed(Sage, he Internet and www(Oxford Uni London. 2000 versity Press, Oxford, 2003) [17]W.W. Zachary, J Anthropol. Res. 33, 452(1977) M.E. J. Newman, SIAM Rev. 45, 167(2003) [18D. Lusseau, Proc R. Soc. London, Ser. B 270, S186(2003) M. Girvan and M. E J. Newman, Proc. Natl. Acad. Sci. U.S.A. [19 Thc critcrion for deciding correct classification is as follows 99,7821(2002) We find the largest set of vertices that are grouped together by [6]M. E J. Newman and M. Girvan, Phys. Rev. E 69, 026113 the algorithm in each of the four known communities. If the (2004) algorithm puts two or more of these sets in the same group [7 D. Wilkinson and B. A. Huberman, Proc. Natl. Acad. Sci then all vertices in those sets are considered incorrectly classi U.S.A.101,5241(2004) ficd. Otherwise, they arc considered corrcctly classified. All [8 P. Holme, M. Huss, and H. Jeong, Bioinformatics 19, 532 other vertices not in the largest sets are considered incorrectly (2003) classified. This criterion is quite harsh--there are cases in 9R. Guimera, L Danon, A. Diaz-Guilera, F. Giralt, and A. Are which one might consider some of the vertices to have been ras,Phys.Rev.E68,065103(2003 dentified correctly. where this method would not. Even with [10JJ. R. Tyler, D. M. Wilkinson, and B. A. Huberman, in Pro- this harsh definition, howcvcr, our algorithm performs wcll eedingy ofthe First International Conference on Communities and a laxer definition would only make its performance more and Technologies, edited by M. Huysman, E. Wenger, and V. Impressive Wulf(Kluwer, Dordrecht, 2003) [20] This power law is different from the one observed in an emai [ll] P Gleiser and L Danon, Adv Complex Syst. 6, 565(2003) etwork by Guimera et al. [9]. They studied the histogram of [12]S. Rcdncr, Eur. Phys. J B 4, 131(1998 community sizcs ovcr all levcls of the dendrogram; wc arc [13 M. E. J. Newman, Proc. Natl. Acad. Sci. U.S.A. 98, 404 looking only at the single level corresponding to the maximum (2001) value of o 066133 【实例截图】

【核心代码】

**标签：**

好例子网口号：伸出你的我的手 —

**分享**！## 相关软件

#### 小贴士

感谢您为本站写下的评论，您的评论对其它用户来说具有重要的参考价值，所以请认真填写。

- 类似“顶”、“沙发”之类没有营养的文字，对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙，所以请不要反馈意义不大的重复字符，也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明，或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦，又没人会搭理，于人于己都无利。

## 网友评论

我要评论