k-means - 如何使用TensorFlow实现k-means?

标签 k-means tensorflow

入门教程使用内置的梯度下降优化器非常有意义。但是,k均值不仅可以插入梯度下降中。似乎我不得不编写自己的优化程序,但是鉴于TensorFlow原语,我不确定如何执行此操作。

我应该采取什么方法?

最佳答案

(注意:您现在可以获取a more polished version of this code as a gist on github。)

您绝对可以做到,但是您需要定义自己的优化标准(对于k均值,通常是最大迭代次数,并且分配稳定后)。这是一个示例,该示例说明了如何实现(可能有更佳的实现方法,而且绝对是选择初始点的更好方法)。如果您真的想避免使用python迭代地做事,那基本上就像您用numpy一样:

import tensorflow as tf
import numpy as np
import time

N=10000
K=4
MAX_ITERS = 1000

start = time.time()

points = tf.Variable(tf.random_uniform([N,2]))
cluster_assignments = tf.Variable(tf.zeros([N], dtype=tf.int64))

# Silly initialization:  Use the first two points as the starting                
# centroids.  In the real world, do this better.                                 
centroids = tf.Variable(tf.slice(points.initialized_value(), [0,0], [K,2]))

# Replicate to N copies of each centroid and K copies of each                    
# point, then subtract and compute the sum of squared distances.                 
rep_centroids = tf.reshape(tf.tile(centroids, [N, 1]), [N, K, 2])
rep_points = tf.reshape(tf.tile(points, [1, K]), [N, K, 2])
sum_squares = tf.reduce_sum(tf.square(rep_points - rep_centroids),
                            reduction_indices=2)

# Use argmin to select the lowest-distance point                                 
best_centroids = tf.argmin(sum_squares, 1)
did_assignments_change = tf.reduce_any(tf.not_equal(best_centroids,
                                                    cluster_assignments))

def bucket_mean(data, bucket_ids, num_buckets):
    total = tf.unsorted_segment_sum(data, bucket_ids, num_buckets)
    count = tf.unsorted_segment_sum(tf.ones_like(data), bucket_ids, num_buckets)
    return total / count

means = bucket_mean(points, best_centroids, K)

# Do not write to the assigned clusters variable until after                     
# computing whether the assignments have changed - hence with_dependencies
with tf.control_dependencies([did_assignments_change]):
    do_updates = tf.group(
        centroids.assign(means),
        cluster_assignments.assign(best_centroids))

sess = tf.Session()
sess.run(tf.initialize_all_variables())

changed = True
iters = 0

while changed and iters < MAX_ITERS:
    iters += 1
    [changed, _] = sess.run([did_assignments_change, do_updates])

[centers, assignments] = sess.run([centroids, cluster_assignments])
end = time.time()
print ("Found in %.2f seconds" % (end-start)), iters, "iterations"
print "Centroids:"
print centers
print "Cluster assignments:", assignments


(请注意,真正的实现需要在初始群集选择时更加谨慎,避免所有点都集中到一个群集的情况发生,等等。这只是一个快速演示。我已经更新了我的答案,使它有点更清晰和“值得榜样”。)

关于k-means - 如何使用TensorFlow实现k-means?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33621643/

相关文章:

tensorflow - 使用 TensorFlow v2.2 将 Keras .h5 模型转换为 TFLite .tflite

matlab - 从彩色背景中提取黑色对象

python - 使用 pycluster 进行加权聚类

python - Kmeans 不知道簇的数量?

algorithm - 在大小相等的 k 个簇中分组 n 个点

tensorflow - Google Cloud ML 支持 GPU 吗?

go - Go 中的 Tensorflow 服务

python - sklearn : calculating accuracy score of k-means on the test data set

python - 如何将张量视为图像?

tensorflow - 如何使用 tensorflow 占位符在 get_collection 中使用