Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Community structure in complex networks
1. Community structure in complex networks
V.A. Traag
KITLV, Leiden, the Netherlands
e-Humanities, KNAW, Amsterdam, the Netherlands
February 21, 2014
eRoyal Netherlands Academy of Arts and Sciences
Humanities
2. Overview
1 What are communities in networks? How do we find them?
2 Where are those small communities?
3 When are communities significant?
4 What should I remember? And what’s next?
4. What is a community?
• Everybody has an intuitive idea.
• Yet no single agreed upon definition.
• Common core:
Groups of nodes that are
relatively densely connected within, and
relatively sparsely connected between.
5. General community detection
• Reward links inside community,
weight aij
• Punish missing links inside
community, weight bij .
• General quality function
H =
ij
(Aij aij −(1−Aij )bij )δ(σi , σj ).
0
1
2
3
4
5
6
7
8
9
10
11
6. General community detection
• Reward links inside community,
weight aij
• Punish missing links inside
community, weight bij .
• General quality function
H =
ij
(Aij aij −(1−Aij )bij )δ(σi , σj ).
0
1
2
3
4
5
6
7
8
9
10
11
7. General community detection
• Reward links inside community,
weight aij
• Punish missing links inside
community, weight bij .
• General quality function
H =
ij
(Aij aij −(1−Aij )bij )δ(σi , σj ).
0
1
2
3
4
5
6
7
8
9
10
11
8. Different weights
No a-priori constraints on weights aij , bij .
Model aij bij
Reichardt & Bornholdt 1 − bij γpij
Arenas, Fern´andez & G´omez 1 − bij pij (γ) − γδij
Ronhovde & Nussinov 1 γ
Constant Potts Model 1 − γ γ
9. Modularity
• Null-model pij , constraint: ij pij = 2m.
• Popular null-model, configuration model pij =
ki kj
2m .
• With γ = 1, leads to modularity:
Q =
ij
Aij −
ki kj
2m
δ(σi , σj ).
• As sum over communities:
Q =
c
(ec − ec ).
18. Resolution limit
• Modularity might miss ‘small’
communities.
• Merge two cliques in ring of cliques
when
γRB <
q
nc(nc − 1) + 2
.
• Number of communities scales as
√
γRBm.
• For general null model, problem
remains since ij pij = 2m.
19. Resolution limit
• Modularity might miss ‘small’
communities.
• Merge two cliques in ring of cliques
when
γRB <
q
nc(nc − 1) + 2
.
• Number of communities scales as
√
γRBm.
• For general null model, problem
remains since ij pij = 2m.
20. Resolution-limit-free
• Ronhovde & Nussinov model (aij = 1, bij = γ).
• Claim: resolution-limit-free, as merge depends only on ‘local’
variables
γRN <
1
n2
c − 1
.
• But, take pij = ki kj , we obtain
γRB <
1
2(nc(nc − 1) + 2)2
,
also only ‘local’ variables. Hence, also resolution-limit-free?
• Problems of scale remain.
24. Defining resolution-limit-free
Definition (Resolution-limit-free)
Objective function H is called resolution-limit-free if, whenever
partition C optimal for G, then subpartition D ⊂ C also optimal
for subgraph H(D) ⊂ G induced by D.
Theorem (Swap optimal subpartitions)
If C is optimal, with subpartition D, we can replace D by another
optimal subpartition D .
25. Defining resolution-limit-free
Definition (Resolution-limit-free)
Objective function H is called resolution-limit-free if, whenever
partition C optimal for G, then subpartition D ⊂ C also optimal
for subgraph H(D) ⊂ G induced by D.
Theorem (Swap optimal subpartitions)
If C is optimal, with subpartition D, we can replace D by another
optimal subpartition D .
What methods are
resolution-limit-free?
26. Resolution-limit-free methods
• RN and CPM can be easily proven resolution-limit-free.
• What about other weights aij and bij ?
Definition (Local weights)
Weights aij , bij called local whenever for subgraph H ⊂ G, weights
remain similar, i.e. aij (G) = λ(H)aij (H) and bij (G) = λ(H)bij (H).
Theorem (Local weights ⇒ resolution-limit-free)
Objective function H is resolution-limit-free if weights are local.
27. Resolution-limit-free methods
• RN and CPM can be easily proven resolution-limit-free.
• What about other weights aij and bij ?
Definition (Local weights)
Weights aij , bij called local whenever for subgraph H ⊂ G, weights
remain similar, i.e. aij (G) = λ(H)aij (H) and bij (G) = λ(H)bij (H).
Theorem (Local weights ⇒ resolution-limit-free)
Objective function H is resolution-limit-free if weights are local.
28. Resolution-limit-free methods
• RN and CPM can be easily proven resolution-limit-free.
• What about other weights aij and bij ?
Definition (Local weights)
Weights aij , bij called local whenever for subgraph H ⊂ G, weights
remain similar, i.e. aij (G) = λ(H)aij (H) and bij (G) = λ(H)bij (H).
Theorem (Local weights ⇒ resolution-limit-free)
Objective function H is resolution-limit-free if weights are local.
29. Resolution-limit-free methods
• RN and CPM can be easily proven resolution-limit-free.
• What about other weights aij and bij ?
Definition (Local weights)
Weights aij , bij called local whenever for subgraph H ⊂ G, weights
remain similar, i.e. aij (G) = λ(H)aij (H) and bij (G) = λ(H)bij (H).
Theorem (Local weights ⇒ resolution-limit-free)
Objective function H is resolution-limit-free if weights are local.
Inverse not true: some small perturbation (i.e. non local weight)
will not change optimal partition. But very few exceptions.
30. Resolution-limit-free methods
• RN and CPM can be easily proven resolution-limit-free.
• What about other weights aij and bij ?
Definition (Local weights)
Weights aij , bij called local whenever for subgraph H ⊂ G, weights
remain similar, i.e. aij (G) = λ(H)aij (H) and bij (G) = λ(H)bij (H).
Theorem (Local weights ⇒ resolution-limit-free)
Objective function H is resolution-limit-free if weights are local.
Inverse not true: some small perturbation (i.e. non local weight)
will not change optimal partition. But very few exceptions.
Local methods are
resolution-limit-free.
32. Modularity in non-modular graphs
Modularity as sign of community structure
• Modularity −1 ≤ Q ≤ 1.
• High modularity ⇒ community structure?
• Modularity higher than 0.3 seen as significant.
33. Modularity in non-modular graphs
Modularity as sign of community structure
• Modularity −1 ≤ Q ≤ 1.
• High modularity ⇒ community structure?
• Modularity higher than 0.3 seen as significant.
Many graphs have high modularity,
but no community structure.
37. Significance
E = 14
E = 9
Fixed partition
E = 11
Better partition
• Not: Probability to find E edges in partition.
• But: Probability to find partition with E edges.
38. Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc, pc) ⊆ G(n, p)) ≈ exp −n2
cD(pc p)
Significance
• Probability for all communities Pr(σ) ≈
c
exp −n2
cD(pc p) .
• Significance S(σ) = − log Pr(σ) =
c
n2
cD(pc p).
39. Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc, pc) ⊆ G(n, p)) ≈ exp −n2
cD(pc p)
Significance
• Probability for all communities Pr(σ) ≈
c
exp −n2
cD(pc p) .
• Significance S(σ) = − log Pr(σ) =
c
n2
cD(pc p).
40. Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc, pc) ⊆ G(n, p)) ≈ exp −n2
cD(pc p)
Significance
• Probability for all communities Pr(σ) ≈
c
exp −n2
cD(pc p) .
• Significance S(σ) = − log Pr(σ) =
c
n2
cD(pc p).
41. Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc, pc) ⊆ G(n, p)) ≈ exp −n2
cD(pc p)
Significance
• Probability for all communities Pr(σ) ≈
c
exp −n2
cD(pc p) .
• Significance S(σ) = − log Pr(σ) =
c
n2
cD(pc p).
42. Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc, pc) ⊆ G(n, p)) ≈ exp −n2
cD(pc p)
Significance
• Probability for all communities Pr(σ) ≈
c
exp −n2
cD(pc p) .
• Significance S(σ) = − log Pr(σ) =
c
n2
cD(pc p).
43. Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc, pc) ⊆ G(n, p)) ≈ exp −n2
cD(pc p)
Significance
• Probability for all communities Pr(σ) ≈
c
exp −n2
cD(pc p) .
• Significance S(σ) = − log Pr(σ) =
c
n2
cD(pc p).
47. Conclusions
To remember
• Modularity can hide small communities.
• Local methods avoid this problem (RN, CPM).
• High modularity ⇒ significant: use significance.
What’s next?
• Various measures of significance: what’s the difference?
• Choose “correct” resolution ⇒ resolution limit?
48. Thank you!
Questions?
Traag, Van Dooren & Nesterov
Narrow scope for resolution-limit-free community detection
Phys Rev E 84, 016114 (2011)
Traag, Krings & Van Dooren
Significant scales in community structure
Sci Rep 3, 2930 (2013)
Reichardt & Bornholdt
Statistical mechanics of community detection.
Phys Rev E 74, 016110 (2006)
www.traag.net vincent@traag.net @vtraag