algorithm - Cure算法的缺点

标签 algorithm cluster-analysis

找了很多都没有找到Cure算法的缺点。 Cure聚类算法有什么局限性吗?

谢谢

最佳答案

Wikipedia Article 中获取此解释关于治愈算法

简短的回答是运行时复杂性

  • 运行时间为 O(n^2 log(n))
  • 空间复杂度为 O(n)

对于数据库应用程序,这是一个相当高的运行时复杂度,因此您可能无法将其直接应用于大型数据库

根据维基百科,可以使用以下方法缓解此限制

  • Random sampling : random sampling supports large data sets. Generally the random sample fits in main memory. The random sampling involves a trade off between accuracy and efficiency.
  • Partitioning : The basic idea is to partition the sample space into p partitions. Each partition contains n/p elements. The first pass partially clusters each partition until the final number of clusters reduces to n/pq for some constant q ≥ 1. A second clustering pass on n/q partially clusters partitions. For the second pass only the representative points are stored since the merge procedure only requires representative points of previous clusters before computing the representative points for the merged cluster. Partitioning the input reduces the execution times.
  • Labeling data on disk : Given only representative points for k clusters, the remaining data points are also assigned to the clusters. For this a fraction of randomly selected representative points for each of the k clusters is chosen and data point is assigned to the cluster containing the representative point closest to it.

关于algorithm - Cure算法的缺点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44313576/

相关文章:

python - Python 中的素数和完美平方检查器

algorithm - 寻找哈希算法的伪代码(开放、链接和多重)

string - 在 KMP 算法中发生不匹配时转移文本的原因是什么?

R 聚类产生错误消息

python - 聚类时间序列时出现不规则/缺失数据

cluster-analysis - 集群间和集群内距离

c - 高效模拟滚动加权骰子(或遍历加权图),并频繁更新

php - 打印级联表

r - 如何使用 FactoMineR 包以编程方式确定主成分的列索引?

python - 将多维集群绘制成二维图 python