kmeans 的 matlab onlinephase 选项与增量/顺序 kmeans 相同吗?

标签 matlab cluster-analysis k-means

我只是有点困惑。 matlab中现有的kmeans函数是否具有串行求数据的kmeans的能力?或者onlinephase还有其他含义吗?

最佳答案

不,'onlinephase' 是在通常的“分配-重新计算”迭代(在批处理模式下)之后作为第二步执行的。它保证在给定所使用的距离函数(也给定初始簇质心)的情况下找到局部最小解,通过在簇之间移动点直到距离总和无法进一步减小。

不要将此与寻找全局最小值混淆(我认为这是 NP 难问题)

文档对此进行了很好的解释:

Algorithms

kmeans uses a two-phase iterative algorithm to minimize the sum of point-to-centroid distances, summed over all k clusters:

  • The first phase uses batch updates, where each iteration consists of reassigning points to their nearest cluster centroid, all at once, followed by recalculation of cluster centroids. This phase occasionally does not converge to solution that is a local minimum, that is, a partition of the data where moving any single point to a different cluster increases the total sum of distances. This is more likely for small data sets. The batch phase is fast, but potentially only approximates a solution as a starting point for the second phase.

  • The second phase uses online updates, where points are individually reassigned if doing so will reduce the sum of distances, and cluster centroids are recomputed after each reassignment. Each iteration during the second phase consists of one pass though all the points. The second phase will converge to a local minimum, although there may be other local minima with lower total sum of distances. The problem of finding the global minimum can only be solved in general by an exhaustive (or clever, or lucky) choice of starting points, but using several replicates with random starting points typically results in a solution that is a global minimum.

关于kmeans 的 matlab onlinephase 选项与增量/顺序 kmeans 相同吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11764779/

相关文章:

matlab - 使用在 matlab/octave 中查找基于数字索引启动 circshift

matlab - Matlab中的mexw64类型文件

python - 文档及其结构的聚类

apache-spark - 如何将Row类型转换为Vector以馈给KMeans

text - K-means 文本文档聚类。如何计算内部相似度和内部相似度?

用于 2D 矩阵数据(图像)的 MATLAB 窗口 FFT

matlab - 为什么 Matlab octave origin awk 中的频率计数对于相同的数据集会得到完全不同的结果?

r - 为什么聚类系数与我的程序和 igraph R 的库不同?

python - 使用sklearn TSNE映射测试数据

r - 了解 R 中的 Biclust 类