python - 在Tensorflow中批量读取numpy矩阵

我正在尝试在 GPU 上运行一些回归模型。虽然我的 GPU 利用率非常低，只有 20%。浏览完代码后，

 for i in range(epochs):
    rand_index = np.random.choice(args.train_pr,
        size=args.batch_size)
    rand_x = X_train[rand_index]
    rand_y = Y_train[rand_index]

我使用这三行为每次迭代选择随机批处理。所以，我想问一下，当训练进行时，我可以为下一次迭代再准备一批吗？

我正在研究回归问题而不是分类问题。我已经在 Tensorflow 中看到过线程，但发现仅适用于图像的示例，并且没有用于训练的大小为 100000X1000 的大矩阵的示例。

最佳答案

您有一个位于主机内存上的大型 numpy 数组。您希望能够在 CPU 上并行处理它并将批处理发送到设备。

自 TF 1.4 起，最好的方法是使用 tf.data.Dataset，特别是 tf.data.Dataset.from_tensor_slices。然而，如the documentation指出，您可能不应该提供 numpy 数组作为此函数的参数，因为它最终会被复制到设备内存中。您应该做的是使用占位符。文档中给出的示例非常不言自明:

features_placeholder = tf.placeholder(features.dtype, features.shape)
labels_placeholder = tf.placeholder(labels.dtype, labels.shape)

dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))
# [Other transformations on `dataset`...]
iterator = dataset.make_initializable_iterator()

sess.run(iterator.initializer, feed_dict={features_placeholder: features,
                                          labels_placeholder: labels})

可以使用 .map 方法对切片进行进一步的预处理或数据增强。为了确保这些操作同时发生，请确保仅使用 tensorflow 操作，并避免使用 tf.py_func 包装 python 操作。

关于python - 在Tensorflow中批量读取numpy矩阵，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45201195/

python - 在Tensorflow中批量读取numpy矩阵

上一篇：python - 在字典中查找并返回键的 pandas 值 - python

下一篇：python - 设计新类时如何打印联系人列表内容