python - 在 Python 中的 Kmeans 之后确定簇的大小

标签 python machine-learning cluster-computing data-analysis

所以我已经成功找到了 python 中的 kmeans 算法所需的最佳簇数，但是现在我如何才能找到在 python 中应用 Kmeans 后得到的簇的确切大小？

这是一段代码

data=np.vstack(zip(simpleassetid_arr,simpleuidarr))
centroids,_ = kmeans(data,round(math.sqrt(len(uidarr)/2)))
idx,_ = vq(data,centroids)

initial = [cluster.vq.kmeans(data,i) for i in range(1,10)]
var=[var for (cent,var) in initial] #to determine the optimal number of k   using elbow test
num_k=int(raw_input("Enter the number of clusters: "))

cent, var = initial[num_k-1]

assignment,cdist = cluster.vq.vq(data,cent)

最佳答案

您可以使用此方法获取簇大小:

print np.bincount(idx)

对于下面的示例，np.bincount(idx) 输出两个元素的数组，例如[156 144]

from numpy import vstack,array
import numpy as np
from numpy.random import rand
from scipy.cluster.vq import kmeans,vq
# data generation
data = vstack((rand(150,2) + array([.5,.5]),rand(150,2)))
# computing K-Means with K = 2 (2 clusters)
centroids,_ = kmeans(data,2)
# assign each sample to a cluster
idx,_ = vq(data,centroids)

#Print number of elements per cluster
print np.bincount(idx)

关于python - 在 Python 中的 Kmeans 之后确定簇的大小，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34024975/

上一篇：python - 在 Python 中将数据写入 UART 并从 C 中读取它们

下一篇：python - 长时间运行过程中 numpy tolist() 的明显内存泄漏

amazon-web-services - 如何使用 AWS IAM 角色运行 StarCluster

database - Google 搜索/ map Linux 集群的软件/硬件结构？

c# - SQL Server 群集的连接字符串

python - 格式化numpy记录数组中的 "Kilo"、 "Mega"、 "Gig"数据

python - 在两列上分组并使用 pandas - Python 在特定列上应用转换(分区)、滚动和连接

machine-learning - 如何在 Driverless AI 上将数字列更改为分类数据

python - 为什么 root.quit() 或 root.destroy() 不起作用，而是两者的组合起作用？

python - 如果循环需要继续运行，如何停止循环中的打印语句重复打印？

python - Google Colab 花费太多时间来训练分类器。如何解决这个问题？