algorithm - Cure算法的缺点

找了很多都没有找到Cure算法的缺点。 Cure聚类算法有什么局限性吗？

谢谢

最佳答案

从 Wikipedia Article 中获取此解释关于治愈算法

简短的回答是运行时复杂性

运行时间为 O(n^2 log(n))
空间复杂度为 O(n)

对于数据库应用程序，这是一个相当高的运行时复杂度，因此您可能无法将其直接应用于大型数据库

根据维基百科，可以使用以下方法缓解此限制

Random sampling : random sampling supports large data sets. Generally the random sample fits in main memory. The random sampling involves a trade off between accuracy and efficiency.

Partitioning : The basic idea is to partition the sample space into p partitions. Each partition contains n/p elements. The first pass partially clusters each partition until the final number of clusters reduces to n/pq for some constant q ≥ 1. A second clustering pass on n/q partially clusters partitions. For the second pass only the representative points are stored since the merge procedure only requires representative points of previous clusters before computing the representative points for the merged cluster. Partitioning the input reduces the execution times.

Labeling data on disk : Given only representative points for k clusters, the remaining data points are also assigned to the clusters. For this a fraction of randomly selected representative points for each of the k clusters is chosen and data point is assigned to the cluster containing the representative point closest to it.

关于algorithm - Cure算法的缺点，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44313576/

algorithm - Cure算法的缺点

上一篇：c - 使用最少的数组查找数组组合以覆盖所有元素

下一篇：arrays - 具有非不同数字的数组中的魔术索引