python - 在 python 中对图形进行光谱聚类

标签 python scikit-learn cluster-analysis graph-theory spectral

我想使用谱聚类对 python 中的图形进行聚类。

谱聚类是一种更通用的技术,不仅可以应用于图形,还可以应用于图像或任何类型的数据,但是,它被认为是一种特殊的图形聚类技术。遗憾的是,我在网上找不到 python 中的谱聚类图示例。

我很想知道如何去做这件事。如果有人能帮我弄清楚,我可以将文档添加到 scikit learn。

注意事项:

最佳答案

没有太多的光谱聚类经验,只是按照文档(跳到最后查看结果!):

代码:

import numpy as np
import networkx as nx
from sklearn.cluster import SpectralClustering
from sklearn import metrics
np.random.seed(1)

# Get your mentioned graph
G = nx.karate_club_graph()

# Get ground-truth: club-labels -> transform to 0/1 np-array
#     (possible overcomplicated networkx usage here)
gt_dict = nx.get_node_attributes(G, 'club')
gt = [gt_dict[i] for i in G.nodes()]
gt = np.array([0 if i == 'Mr. Hi' else 1 for i in gt])

# Get adjacency-matrix as numpy-array
adj_mat = nx.to_numpy_matrix(G)

print('ground truth')
print(gt)

# Cluster
sc = SpectralClustering(2, affinity='precomputed', n_init=100)
sc.fit(adj_mat)

# Compare ground-truth and clustering-results
print('spectral clustering')
print(sc.labels_)
print('just for better-visualization: invert clusters (permutation)')
print(np.abs(sc.labels_ - 1))

# Calculate some clustering metrics
print(metrics.adjusted_rand_score(gt, sc.labels_))
print(metrics.adjusted_mutual_info_score(gt, sc.labels_))

输出:

ground truth
[0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]
spectral clustering
[1 1 0 1 1 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
just for better-visualization: invert clusters (permutation)
[0 0 1 0 0 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
0.204094758281
0.271689477828

总体思路:

来自here的数据和任务介绍:

The nodes in the graph represent the 34 members in a college Karate club. (Zachary is a sociologist, and he was one of the members.) An edge between two nodes indicates that the two members spent significant time together outside normal club meetings. The dataset is interesting because while Zachary was collecting his data, there was a dispute in the Karate club, and it split into two factions: one led by “Mr. Hi”, and one led by “John A”. It turns out that using only the connectivity information (the edges), it is possible to recover the two factions.

使用 sklearn 和光谱聚类来解决这个问题:

If affinity is the adjacency matrix of a graph, this method can be used to find normalized graph cuts.

This将归一化图切割描述为:

Find two disjoint partitions A and B of the vertices V of a graph, so that A ∪ B = V and A ∩ B = ∅

Given a similarity measure w(i,j) between two vertices (e.g. identity when they are connected) a cut value (and its normalized version) is defined as: cut(A, B) = SUM u in A, v in B: w(u, v)

...

we seek the minimization of disassociation between the groups A and B and the maximization of the association within each group

听起来不错。所以我们创建邻接矩阵 (nx.to_numpy_matrix(G)) 并将参数 affinity 设置为 precomputed(因为我们的邻接矩阵是我们的预先计算的相似性度量)。

Alternatively, using precomputed, a user-provided affinity matrix can be used.

编辑:虽然对此不熟悉,但我查找了要调整的参数并找到了assign_labels :

The strategy to use to assign labels in the embedding space. There are two ways to assign labels after the laplacian embedding. k-means can be applied and is a popular choice. But it can also be sensitive to initialization. Discretization is another approach which is less sensitive to random initialization.

所以尝试不太敏感的方法:

sc = SpectralClustering(2, affinity='precomputed', n_init=100, assign_labels='discretize')

输出:

ground truth
[0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]
spectral clustering
[0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]
just for better-visualization: invert clusters (permutation)
[1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
0.771725032425
0.722546051351

这与基本事实非常吻合!

关于python - 在 python 中对图形进行光谱聚类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46258657/

相关文章:

python - 即使似乎已安装,也无法导入 scikits-learn

scikit-learn - scikit-learn 中交叉验证的一种标准错误规则

python - Pandas - 从具有出现次数的可迭代对象中获取虚拟对象

python - 绘制同一社区或分区的网络和分组顶点

r - 更改我的 ggplot 树状图中的字体

python - Python 中的正则表达式匹配问题

python - 在特定字符串开始后查找方括号外的文本

python - Python 会自动用 << 1 替换 * 2 吗?

python - virtualenv 和 VIRTUAL_ENV 关键字

python - 为什么在聚类之前跨行而非列进行数据标准化(预处理)