python - 使用 TPU 在 GCP 上进行 Keras/Tensorflow 训练

标签 python tensorflow keras google-cloud-platform bucket

我正在尝试使用 keras 和 tensorflow 1.15 在 GCP 上训练模型。 从现在开始,我的代码与我在 colab 上所做的类似,即:

# TPUs
import tensorflow as tf
print(tf.__version__)
cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver("tpu-name")
tf.config.experimental_connect_to_cluster(cluster_resolver)
tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
tpu_strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver)
print("Number of accelerators: ", tpu_strategy.num_replicas_in_sync)


import numpy as np


np.random.seed(123)  # for reproducibility
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Convolution2D, MaxPooling2D, Input
from tensorflow.keras import utils
from tensorflow.keras.datasets import mnist, cifar10
from tensorflow.keras.models import Model

# 4. Load data into train and test sets
(X_train, y_train) = load_data(sets="gs://BUCKETS/dogscats/train/",target_size=img_size)
(X_test, y_test) =  load_data(sets="gs://BUCKETS/dogscats/valid/",target_size=img_size)
print(X_train.shape, X_test.shape)

# 5. Preprocess input data
#X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
#X_test = X_test.reshape(X_test.shape[0], 28, 28,1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255.0
X_test /= 255.0

print(y_train.shape, y_test.shape)
# 6. Preprocess class labels One hot encoding
Y_train = utils.to_categorical(y_train, 2)
Y_test = utils.to_categorical(y_test, 2)
print(Y_train.shape, Y_test.shape)

with tpu_strategy.scope():
  model = make_model((img_size, img_size, 3))
  # 8. Compile model
  model.compile(loss='categorical_crossentropy',
                optimizer="sgd",
                metrics=['accuracy'])

model.summary()

batch_size = 1250 * tpu_strategy.num_replicas_in_sync
# 9. Fit model on training data
model.fit(X_train, Y_train, steps_per_epoch=len(X_train)//batch_size,  
            epochs=5, verbose=1)

但是我的数据在存储桶上,我的代码在虚拟机上。那么我必须做什么?我尝试使用“gs://BUCKETS”加载数据,但它不起作用。我应该怎么办 ? 编辑:我添加了代码来加载数据,抱歉我忘记了。

def load_data(sets="dogcats/train/", k = 5000, target_size=250):
  # define location of dataset
  folder = sets
  photos, labels = list(), list()
  # determine class
  output = 0.0
  for i, dog in enumerate(listdir(folder + "dogs/")):
    if i >= k:
      break
    # load image
    photo = load_img(folder + "dogs/" +dog, target_size=(target_size, target_size))
    # convert to numpy array
    photo = img_to_array(photo)
    # store
    photos.append(photo)
    labels.append(output)

  output = 1.0

  for i, cat in enumerate(listdir(folder + "cats/") ):
    if i >= k:
      break
    # load image
    photo = load_img(folder + "cats/"+cat, target_size=(target_size, target_size))
    # convert to numpy array
    photo = img_to_array(photo)
    # store
    photos.append(photo)
    labels.append(output)

  # convert to a numpy arrays
  photos = asarray(photos)
  labels = asarray(labels)
  print(photos.shape, labels.shape)
  photos, labels = shuffle(photos, labels, random_state=0)
  return photos, labels

编辑2:为了完成@daudnadeem的答案,以防其他人遇到同样的情况。

我的目标是从存储桶中获取图像,因此代码运行良好并允许获取字节对象。要将其转换为图像,您只需要使用 PIL 库:

from PIL import Image
from io import BytesIO
import numpy as np

from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket("BUCKETS")
blob = bucket.get_blob('dogscats/train/<you-will-need-to-point-to-a-file-and-not-a-directory>')
data = blob.download_as_string()

img = Image.open(BytesIO(data))
img = np.array(img)

最佳答案

(X_train, y_train) = load_data(sets="gs://BUCKETS/dogscats/train/",target_size=img_size)
(X_test, y_test) =  load_data(sets="gs://BUCKETS/dogscats/valid/",target_size=img_size)

这显然行不通,因为基本上您所做的只是设置了一个字符串。您需要做的是将这些数据作为字符串下载,然后使用它。

首先安装软件包pip install google-cloud-storagepip3 install google-cloud-storage

pip -> Python

pip3 -> Python3

看看this ,您将需要一个服务帐户才能通过代码与 GCP 进行交互。用于身份验证目的。

当您以 json 形式获取服务帐户时,您需要执行以下两件事之一:

将其设置为环境变量: 导出 GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"

或者我的首选解决方法

gcloud auth activate-service-account \
  <repalce-with-email-from-json-file> \
          --key-file=<path/to/your/json/file> --project=<name-of-your-gcp-project>

现在让我们看看如何使用 google-cloud-storage 库以字符串形式下载文件:

from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket("BUCKETS")
blob = bucket.get_blob('/dogscats/train/<you-will-need-to-point-to-a-file-and-not-a-directory>')
data = blob.download_as_string()

现在您已经有了字符串形式的数据,您可以简单地将 data 传递到加载数据中,如下所示 (X_train, y_train) = load_data(sets=data,target_size=img_size)

听起来很复杂,但这里有一个快速的伪布局:

  1. 安装 google-cloud-storage
  2. 转至 Google Cloud Platform Console -> IAM 和管理 -> 服务帐号
  3. 创建具有相关权限的服务帐号 (google-cloud-storage)
  4. 下载 (JSON) 文件并记住位置。
  5. 激活服务帐户
  6. 以字符串形式下载文件并将该字符串传递给您的 load_data(data)

希望有帮助!

关于python - 使用 TPU 在 GCP 上进行 Keras/Tensorflow 训练,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58879239/

相关文章:

python - 具有 None 类型形状张量的 TensorFlow 2.0 层

python - MySQL 中的空表,即使 Python 可以将数据插入表中

python - TensorFlow: AttributeError: 'Tensor' 对象没有属性 'shape'

python - 添加随机加权点

python - Keras 函数模型提供高验证准确性但不正确的预测

python - 如何确保 Keras 模型权重在每次模型拟合时随机初始化

python - 在终端上运行 Python 脚本,然后继续使用终端

tensorflow - keras model.fit_generator 的分割图像数据集

machine-learning - 使用自定义模型进行 Tensorflow 对象检测

python - 将编译后的模型存储在 Keras 中?