algorithm - k-means 中的初始质心

所以我在网上找到了一段描述:

Start with the center of all points. Choose successively the point that is the furthest away from all centers as a center for the next cluster.

因此我认为:

center = 所有点的平均值

centroid1 = 离中心最远的点

centroid2 = 离中心 AND centroid1 最远的点

centroid3 = 离中心最远的点 AND centroid1 和 centroid2。

我的问题是，我应该如何计算离中心和质心 1 最远的点？我是否平均它们然后选择离中间最远的点？我是否计算与 center 和 centroid1 的最大距离点并选择更远的一个？如果是这样，centroid3 不会等于 centroid1 或 2 吗？

最佳答案

在本文档中 Centroids Initialization for K-Means Clustering using Improved Pillar Algorithm最远意味着总和。因此，在第二步中，您需要将与第一个质心的距离与距离形成每个点的所有点的平均值相加，然后选择最大的一个。

提供的伪代码中的相关行是

2. Calculate D <- dis(X, m)
...
6. Set i = 1 as counter to determine the i-th initial centroid
7. DM = DM + D
8. Select x <- xargmax(DM) as the candidate for i-th initial centroids

To select a next x for the candidate of the rest initial centroids, D_i (where i is the current iteration step) is recalculated between each data points and c_i-1 . The D_i is then added to the accumulated distance metric DM (DM <- DM + D_i).

关于algorithm - k-means 中的初始质心，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53765197/

上一篇：Python:如何在 RGB 图像上实现二值滤波器？ (算法)

下一篇：algorithm - 如何构造和证明循环不变量，这允许显示部分正确性

c - 使用指针 c 反转数组

c++ - 实现查找表

javascript - 我的算法是错误的还是正确的，只是需要调整？

Javascript - 数据集太大，只需要包含最多 1000 个均匀分布的值的数据

algorithm - 快速 (< n^2) 聚类算法

algorithm - K-Medoids/K-Means 算法。两个或多个聚类代表之间距离相等的数据点

algorithm - 返回 Racket ISL 中数字列表中的最小元素？

java - 用线检测拦截

java - 在 Java 中检测偶数的最有效方法是什么？