Community structure in complex networks

1. Community structure in complex networks V.A. Traag KITLV, Leiden, the Netherlands e-Humanities, KNAW, Amsterdam, the Netherlands February 21, 2014 eRoyal Netherlands Academy of Arts and Sciences Humanities

2. Overview 1 What are communities in networks? How do we ﬁnd them? 2 Where are those small communities? 3 When are communities signiﬁcant? 4 What should I remember? And what’s next?

3. Part I What are communities? How do we ﬁnd them?

4. What is a community? • Everybody has an intuitive idea. • Yet no single agreed upon deﬁnition. • Common core: Groups of nodes that are relatively densely connected within, and relatively sparsely connected between.

5. General community detection • Reward links inside community, weight aij • Punish missing links inside community, weight bij . • General quality function H = ij (Aij aij −(1−Aij )bij )δ(σi , σj ). 0 1 2 3 4 5 6 7 8 9 10 11

8. Different weights No a-priori constraints on weights aij , bij . Model aij bij Reichardt & Bornholdt 1 − bij γpij Arenas, Fernández & Gómez 1 − bij pij (γ) − γδij Ronhovde & Nussinov 1 γ Constant Potts Model 1 − γ γ

9. Modularity • Null-model pij , constraint: ij pij = 2m. • Popular null-model, conﬁguration model pij = ki kj 2m . • With γ = 1, leads to modularity: Q = ij Aij − ki kj 2m δ(σi , σj ). • As sum over communities: Q = c (ec − ec ).

10. Optimising modularity Initial communities 0 1 2 3 4 5 6 7 8 9 10 11

11. Optimising modularity Initial communities Move 0 0 1 2 3 4 5 6 7 8 9 10 11

12. Optimising modularity Initial communities Move 0 Move 5 0 1 2 3 4 5 6 7 8 9 10 11

13. Optimising modularity Initial communities Move 0 Move 5 Move 11 0 1 2 3 4 5 6 7 8 9 10 11

14. Optimising modularity Initial communities Move 0 Move 5 Move 11 0 1 2 3 4 5 6 7 8 9 10 11 No more improvement

15. Optimising modularity Initial communities Move 0 Move 5 Move 11 0 1 2 1 1 1 3 6 5 Aggregate graph, and repeat same procedure.

16. Optimising modularity Initial communities Move 0 Move 5 Move 11 0 1 2 1 1 1 3 6 5 Aggregate graph, and repeat same procedure. Louvain algorithm 1 Move node i to best (greedy) community. 2 Repeat (1) until no more improvement. 3 Contract graph (communities → nodes). 4 Repeat (1)-(3) until no more improvement.

17. Part II Where are those small communities?

18. Resolution limit • Modularity might miss ‘small’ communities. • Merge two cliques in ring of cliques when γRB < q nc(nc − 1) + 2 . • Number of communities scales as √ γRBm. • For general null model, problem remains since ij pij = 2m.

19. Resolution limit • Modularity might miss ‘small’ communities. • Merge two cliques in ring of cliques when γRB < q nc(nc − 1) + 2 . • Number of communities scales as √ γRBm. • For general null model, problem remains since ij pij = 2m.

20. Resolution-limit-free • Ronhovde & Nussinov model (aij = 1, bij = γ). • Claim: resolution-limit-free, as merge depends only on ‘local’ variables γRN < 1 n2 c − 1 . • But, take pij = ki kj , we obtain γRB < 1 2(nc(nc − 1) + 2)2 , also only ‘local’ variables. Hence, also resolution-limit-free? • Problems of scale remain.

21. Resolution limit

22. Resolution limit

23. Resolution limit Resolution limit Resolution-limit-free

24. Deﬁning resolution-limit-free Deﬁnition (Resolution-limit-free) Objective function H is called resolution-limit-free if, whenever partition C optimal for G, then subpartition D ⊂ C also optimal for subgraph H(D) ⊂ G induced by D. Theorem (Swap optimal subpartitions) If C is optimal, with subpartition D, we can replace D by another optimal subpartition D .

25. Deﬁning resolution-limit-free Deﬁnition (Resolution-limit-free) Objective function H is called resolution-limit-free if, whenever partition C optimal for G, then subpartition D ⊂ C also optimal for subgraph H(D) ⊂ G induced by D. Theorem (Swap optimal subpartitions) If C is optimal, with subpartition D, we can replace D by another optimal subpartition D . What methods are resolution-limit-free?

26. Resolution-limit-free methods • RN and CPM can be easily proven resolution-limit-free. • What about other weights aij and bij ? Deﬁnition (Local weights) Weights aij , bij called local whenever for subgraph H ⊂ G, weights remain similar, i.e. aij (G) = λ(H)aij (H) and bij (G) = λ(H)bij (H). Theorem (Local weights ⇒ resolution-limit-free) Objective function H is resolution-limit-free if weights are local.

29. Resolution-limit-free methods • RN and CPM can be easily proven resolution-limit-free. • What about other weights aij and bij ? Deﬁnition (Local weights) Weights aij , bij called local whenever for subgraph H ⊂ G, weights remain similar, i.e. aij (G) = λ(H)aij (H) and bij (G) = λ(H)bij (H). Theorem (Local weights ⇒ resolution-limit-free) Objective function H is resolution-limit-free if weights are local. Inverse not true: some small perturbation (i.e. non local weight) will not change optimal partition. But very few exceptions.

30. Resolution-limit-free methods • RN and CPM can be easily proven resolution-limit-free. • What about other weights aij and bij ? Deﬁnition (Local weights) Weights aij , bij called local whenever for subgraph H ⊂ G, weights remain similar, i.e. aij (G) = λ(H)aij (H) and bij (G) = λ(H)bij (H). Theorem (Local weights ⇒ resolution-limit-free) Objective function H is resolution-limit-free if weights are local. Inverse not true: some small perturbation (i.e. non local weight) will not change optimal partition. But very few exceptions. Local methods are resolution-limit-free.

31. Part II When are communities signiﬁcant?

32. Modularity in non-modular graphs Modularity as sign of community structure • Modularity −1 ≤ Q ≤ 1. • High modularity ⇒ community structure? • Modularity higher than 0.3 seen as signiﬁcant.

33. Modularity in non-modular graphs Modularity as sign of community structure • Modularity −1 ≤ Q ≤ 1. • High modularity ⇒ community structure? • Modularity higher than 0.3 seen as signiﬁcant. Many graphs have high modularity, but no community structure.

34. Modularity without community structure Q = 0.31 Modularity Q ≈ 0 for random graphs.

35. Signiﬁcance How signiﬁcant is a partition?

36. Signiﬁcance E = 14 E = 9 Fixed partition E = 11 Better partition

37. Significance E = 14 E = 9 Fixed partition E = 11 Better partition • Not: Probability to find E edges in partition. • But: Probability to find partition with E edges.

38. Subgraph probability Decompose partition • Probability to find partition with E edges. • Probability to find communities with ec edges. • Asymptotic estimate • Probability for subgraph of nc nodes with density pc Pr(S(nc, pc) ⊆ G(n, p)) ≈ exp −n2 cD(pc p) Significance • Probability for all communities Pr(σ) ≈ c exp −n2 cD(pc p) . • Significance S(σ) = − log Pr(σ) = c n2 cD(pc p).

44. Signiﬁcance 10−3 10−2 10−1 100 103 104 105 106 γ N E

45. Signiﬁcance 10−3 10−2 10−1 100 103 104 105 106 γ N E S

46. Final Chapter What should I remember? And what’s next?

47. Conclusions To remember • Modularity can hide small communities. • Local methods avoid this problem (RN, CPM). • High modularity ⇒ significant: use significance. What’s next? • Various measures of significance: what’s the difference? • Choose “correct” resolution ⇒ resolution limit?

48. Thank you! Questions? Traag, Van Dooren & Nesterov Narrow scope for resolution-limit-free community detection Phys Rev E 84, 016114 (2011) Traag, Krings & Van Dooren Signiﬁcant scales in community structure Sci Rep 3, 2930 (2013) Reichardt & Bornholdt Statistical mechanics of community detection. Phys Rev E 74, 016110 (2006) www.traag.net vincent@traag.net @vtraag

Community structure in complex networks

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

Similar to Community structure in complex networks

Similar to Community structure in complex networks (20)

More from Vincent Traag

More from Vincent Traag (20)

Recently uploaded

Recently uploaded (20)

Community structure in complex networks