python - 在 scipy.cluster.hierarchy.linkage() 中使用距离矩阵？

我有一个距离矩阵 n*n M，其中 M_ij 是 object_i 和 object_j 之间的距离。所以正如预期的那样，它采用以下形式:

   /  0     M_01    M_02    ...    M_0n\
   | M_10    0      M_12    ...    M_1n |
   | M_20   M_21     0      ...    M2_n |
   |                ...                 |
   \ M_n0   M_n2    M_n2    ...      0 /

现在我希望用层次聚类来聚类这 n 个对象。 Python 有一个名为 scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean') 的实现。

Its documentation说:

y must be a {n \choose 2} sized vector where n is the number of original observations paired in the distance matrix.

y : ndarray

A condensed or redundant distance matrix. A condensed distance matrix is a flat array containing the upper triangular of the distance matrix. This is the form that pdist returns. Alternatively, a collection of m observation vectors in n dimensions may be passed as an m by n array.

我对 y 的描述感到困惑。 我可以直接输入我的 M 作为输入 y 吗？

更新

@hongbo-zhu-cn has raised this issue up in GitHub .这正是我所关心的。但是，作为 GitHub 的新手，我不知道它是如何工作的，因此不知道这个问题是如何处理的。

最佳答案

看起来我们确实不能直接传入冗余方阵，尽管文档声称我们可以这样做。

为了让以后遇到同样问题的人受益，我在这里写下我的解决方案作为附加答案。所以复制粘贴的人就可以继续进行聚类了。

使用以下代码段来压缩矩阵并愉快地继续。

import scipy.spatial.distance as ssd
# convert the redundant n*n square matrix form into a condensed nC2 array
    distArray = ssd.squareform(distMatrix) # distArray[{n choose 2}-{n-i choose 2} + (j-i-1)] is the distance between points i and j

如果我错了，请纠正我。

关于python - 在 scipy.cluster.hierarchy.linkage() 中使用距离矩阵？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18952587/

python - 在 scipy.cluster.hierarchy.linkage() 中使用距离矩阵？

上一篇：Python:使用 setup.py install (distutil) 时如何强制覆盖文件

下一篇：python - 时间段后停止代码