tensorflow - 使用 "tf.contrib.factorization.KMeansClustering"

标签 tensorflow machine-learning unsupervised-learning tensorflow-estimator

引用此链接，(the Link) 我尝试练习使用 tf.contrib.factorization.KMeansClustering 进行聚类。下面的简单代码可以正常工作:

import numpy as np
import tensorflow as tf

# ---- Create Data Sample -----
k = 5
n = 100
variables = 5
points = np.random.uniform(0, 1000, [n, variables])

# ---- Clustering -----
input_fn=lambda: tf.train.limit_epochs(tf.convert_to_tensor(points, dtype=tf.float32), num_epochs=1)
kmeans=tf.contrib.factorization.KMeansClustering(num_clusters=6)
kmeans.train(input_fn=input_fn)
centers = kmeans.cluster_centers()

# ---- Print out -----
cluster_indices = list(kmeans.predict_cluster_index(input_fn))
for i, point in enumerate(points):
  cluster_index = cluster_indices[i]
  print ('point:', point, 'is in cluster', cluster_index, 'centered at', centers[cluster_index])

我的问题是为什么这个“input_fn”代码会起作用？如果我将代码更改为这样，它将陷入无限循环。为什么？？

input_fn=lambda:tf.convert_to_tensor(points, dtype=tf.float32)

摘自文档(here) ，看来 train() 正在等待 input_fn 的参数，它只是一个 'tf.data.Dataset' 对象，就像 Tensor(X) 一样。那么，为什么我必须对 lambda 做所有这些棘手的事情:tf.train.limit_epochs()？

熟悉 tensorflow 估计器基础知识的人可以帮忙解释一下吗？非常感谢!

最佳答案

My question is why would this "input_fn" code does the trick? If I change the code to this, it will run into an infinite loop. Why??

文档指出，input_fn 会被重复调用，直到返回 tf.errors.OutOfRangeError。用 tf.train.limit_epochs 装饰您的张量可确保最终引发错误，这会向 KMeans 发出信号，表明它应该停止训练。

关于tensorflow - 使用 "tf.contrib.factorization.KMeansClustering"，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49418325/

上一篇：python - InvalidArgumentError : logits and labels must be same size logits_size=[80, 2] labels_size=[1,80]

下一篇：R(插入符)- 在循环中训练 "mlpML"模型时出错

相关文章：

python - 深度学习和神经网络

python - 在 Scikit 的谱聚类中使用预先计算的亲和性矩阵时的 KNN？

tensorflow - tensorflow 中的自定义 f1_score 指标

python - python中的共聚类算法

machine-learning - 使用 Weka 进行无监督聚类

tensorflow - TensorBoard 图表中的 "n tensors"是什么意思？

python - Tensorflow 错误 : "Label IDs must < n_classes", 但我的标签 ID 似乎已经满足此要求

variables - 为什么用小stddev设置tensorflow变量的原因

python - 使用相同参数时，GridSearchCV 的性能比普通 SVM 差

r - 为插入符包中的多个列创建DataPartition