python - 是否可以在 Google TPU 上运行常规的 python 代码?

标签 python tensorflow google-colaboratory tpu

所以我对 Google TPU 还很陌生。根据我已经研究过的内容,它专门针对训练在 TensorFlow 上编写的机器学习模型进行了优化。 目前,我正在尝试了解 TPU 如何执行其他类型的功能。这些功能与机器学习无关。 我一直在尝试调整我的代码,以便它可以在 Google Colab 的 TPU 上运行,但我不确定它是否有效,或者这是否是最佳方法。 这是我的 O(n<sup>3</sup>) 代码矩阵乘法算法:

import os
import numpy as np
from random import seed
from random import random
import tensorflow as tf
import time;

#check that this is running on the TPU
try:
  tpu = tf.contrib.cluster_resolver.TPUClusterResolver() # TPU detection

  print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])  
except ValueError:
  print("Running on GPU or CPU")
  tpu = None

#TPU details
if 'COLAB_TPU_ADDR' not in os.environ:
  print('ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!')
else:
  tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
  print ('TPU address is', tpu_address)

def multiplicationComputation():
  #size of matrix
  row_size = 128
  col_size = 128
  N = row_size*col_size

  #class for matrix
  class MatrixMultiplication: 
    matrix1 = np.empty(N) #DO NOT USE np.arange(N)
    matrix2 = np.empty(N)
    product = np.empty(N) #product size is the matrix1.columns x matrix2.rows

  #create MatrixMultiplication object
  m = MatrixMultiplication()

  #fill objects's data structures
  #seed for matrix 1
  seed(1) 
  for x in range(N):
    value = random()
    m.matrix1[x] = value

  #seed for matrix 2
  seed(7) 
  for x in range(N):
    value = random()
    m.matrix2[x] = value

  #multiply matrix1 and matrix2
  start = time.time()
  qtySaves = 0;
  for i in range(row_size):
    for j in range(col_size):
      i_col = i * col_size
      sum = 0
      for k in range(row_size):
        k_col = k * col_size
        multiplication = m.matrix1[i_col + k] * m.matrix2[k_col + j]
        sum = sum + multiplication

      m.product[i_col + j] = sum #The result of the multiplication is saved on the product matrix
      qtySaves = qtySaves + 1

  end = time.time()
  #print result
  print()
  print("Result O(n^3): ")
  for i in range(N):
    if i % row_size == 0 and i > 0:
      print()  
    print(str(m.product[i]), end =" ")

  print()
  print("For n = " + str(N) + ", time is " + str(end - start))

#rewrite computation so it can be executed on the TPU
#tpuOperation = tf.contrib.tpu.rewrite(multiplicationComputation)
tpuOperation = tf.contrib.tpu.batch_parallel(multiplicationComputation, [], num_shards=8)

#run
session = tf.Session(tpu_address, config=tf.ConfigProto(isolate_session_state=True, log_device_placement=True)) #isolate session state = True for distributed runtime
try:
  session.run(tf.contrib.tpu.initialize_system()) #initializes a distributed TPU system
  session.run(tpuOperation)
finally:
  #TPU sessions must be shutdown separately from closing the session
  session.run(tf.contrib.tpu.shutdown_system())
  session.close()

我担心这不是在 TPU 上运行。调用session.list_devices()时我看到列出了一个 CPU,我担心我的代码可能实际上是在 CPU 而不是 TPU 上运行。这是上述命令的输出:

TPU devices: 
[_DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 10448234186946304259),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 2088593175391423031),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 1681908406791603718),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 2618396797726491975),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 14243051360425930068),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 15491507241115490455),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 9239156557030772892),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 16970377907446102335),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 6145936732121669294),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 11372860691871753999),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 12653526146081894211)]

目前,我不寻求有关使用哪种加速器的建议。我想测试 TPU 并确保我的代码在上面运行。请帮忙!

最佳答案

恐怕 tensorflow 的存在与否对 np 操作的执行方式没有影响。

在上面的示例中,当您指定

tpuOperation = tf.contrib.tpu.batch_parallel(multiplicationComputation, [], num_shards=8)

其中 multiplicationComputation 没有要并行化的 TPU 特定代码,它将按照在 CPU 上调用 multiplicationComputation 时正常运行的方式运行。

您将不得不使用 TF 操作重写您的代码,以允许它在 GPU 上运行。 Tensorflow 会将您的操作转换为 TPU 特定代码。

关于python - 是否可以在 Google TPU 上运行常规的 python 代码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56572787/

相关文章:

python - 如何在 autograd 反向传播中禁用某些模块的梯度更新?

python - 为什么此类中的列表理解对某些方法有效,而对其他方法无效?

python - Tensorflow.compat.v2.__internal__.tracking' 没有属性 'TrackableSaver' 错误

tensorflow - Tensorflow:如何在python中将标量张量转换为标量变量?

python - 使用 Colab 访问本地文件夹

python - 为什么Google Colab中GPU比CPU慢很多?

python - 将大块文本转换为图像的算法? (由图像边缘定义)

python - request.body 中的 Django POST 数据但不能存储在变量中

tensorflow - tf.contrib.data.prefetch_to_device 不会导致训练加速

google-colaboratory - 协作虚拟实例 IP 范围?