我只是有点困惑。 matlab中现有的kmeans函数是否具有串行求数据的kmeans的能力?或者onlinephase还有其他含义吗?
最佳答案
不,'onlinephase'
是在通常的“分配-重新计算”迭代(在批处理模式下)之后作为第二步执行的。它保证在给定所使用的距离函数(也给定初始簇质心)的情况下找到局部最小解,通过在簇之间移动点直到距离总和无法进一步减小。
不要将此与寻找全局最小值混淆(我认为这是 NP 难问题)
文档对此进行了很好的解释:
Algorithms
kmeans uses a two-phase iterative algorithm to minimize the sum of point-to-centroid distances, summed over all k clusters:
The first phase uses batch updates, where each iteration consists of reassigning points to their nearest cluster centroid, all at once, followed by recalculation of cluster centroids. This phase occasionally does not converge to solution that is a local minimum, that is, a partition of the data where moving any single point to a different cluster increases the total sum of distances. This is more likely for small data sets. The batch phase is fast, but potentially only approximates a solution as a starting point for the second phase.
The second phase uses online updates, where points are individually reassigned if doing so will reduce the sum of distances, and cluster centroids are recomputed after each reassignment. Each iteration during the second phase consists of one pass though all the points. The second phase will converge to a local minimum, although there may be other local minima with lower total sum of distances. The problem of finding the global minimum can only be solved in general by an exhaustive (or clever, or lucky) choice of starting points, but using several replicates with random starting points typically results in a solution that is a global minimum.
关于kmeans 的 matlab onlinephase 选项与增量/顺序 kmeans 相同吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11764779/