python - TensorFlow CustomOp : multiprocessing not working for CPU

我在 Tensorflow (Tensorflow 1.13.1) 中定义了一个自定义操作。单线程版本运行良好，但我想通过 work_sharder.h 使用多线程它首先只能找到一个 worker ，然后是段错误。

我在扁平数组的索引上定义一个分片函数:

 #include <stdio.h>
#include <cfloat>

#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/tensor_shape.h"

#include "./work_sharder.h"

using namespace tensorflow;
typedef Eigen::ThreadPoolDevice CPUDevice;

REGISTER_OP("Minimal")
    .Input("input: float")
    .Output("shared_arr: float")
;

class MinimalOp : public OpKernel {
 public:
  explicit MinimalOp(OpKernelConstruction* context) : OpKernel(context) {}

  void Compute(OpKernelContext* context) override {

    const Tensor& input= context->input(0);
    auto input_flat = input.flat<float>();
    const int N = input_flat.size();

    // Create an output tensor of the right shape
    Tensor* shared_arr = NULL;
    OP_REQUIRES_OK(context, context->allocate_output(0, input.shape(),
                                                     &shared_arr));
    // This tensor is going to be shared among threads
    auto shared_arr_flat = shared_arr->flat<float>();

    // Shard function on ranges
    auto shard = [&input_flat, &shared_arr_flat]
                  (int64 start, int64 limit) {
        for (int i = 0; start < limit; i++) {
            if ((input_flat(i))<0.){
                shared_arr_flat(i) = 0.;
            }}};

    std::cout<<"Shard definition was okay\n";
    const DeviceBase::CpuWorkerThreads& worker_threads = *(context->device()->tensorflow_cpu_worker_threads());
    std::cout<<"Number of workers = "<<worker_threads.num_threads<<"\n";
    const int64 shard_cost = N;
    Shard(worker_threads.num_threads, worker_threads.workers,
            N, shard_cost, shard);

  }};

REGISTER_KERNEL_BUILDER(Name("Minimal").Device(DEVICE_CPU), MinimalOp);

它编译完美。
在 python 中运行这个多线程代码时:

import tensorflow as tf
import numpy as np


minimal_module = tf.load_op_library("./minimal.so")
tf_minimal = minimal_module.minimal

input_tensor = tf.constant(np.random.normal(size=(100, 100)).astype("float32"))
returned_tensor = tf_minimal(input_tensor)
sess = tf.Session()
sess.run(returned_tensor)

它打印: worker 数 = 1 和段错误。g++ --version 的输出是:

Apple LLVM version 10.0.1 (clang-1001.0.46.3)
Target: x86_64-apple-darwin18.2.0
Thread model: posix

使用 multiprocessing 时python中的库它找到12个 worker 。

我编译使用:

TF_CFLAGS=( $(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -undefined dynamic_lookup minimal.cc -o minimal.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2

编辑:

根据评论，我安装了 gcc 4.9(4.8 在 brew 上不再存在，因为这个问题上的一些人说问题是从 5.x 到 4.x 的变化)。
我有一些奇怪的错误，因为它找不到标准库。所以我不得不做一些其他的 xcode 安装东西，它修复了它。

现在在编译期间(g++-4.9 而不是 g++)我收到很多警告(警告:不推荐使用“__const_coal”部分等)。

但它编译，当我运行它时，我有这个错误:Symbol not found: __ZN10tensorflow12OpDefBuilder5InputESs ,

但是，它不能通过删除行 -D _GLIBCXX_USE_CXX11_ABI=0, -D_GLIBCXX_USE_CXX11_ABI=0 甚至添加它来解决。

所以我不能说我取得了任何进展。

最佳答案

根据 this issue 的解决方案在 GitHub 上。

我变了

-D_GLIBCXX_USE_CXX11_ABI=0

到

-D_GLIBCXX_USE_CXX11_ABI=1

问题就解决了。
请注意，我使用的是 Python 3.7。

祝你好运。

关于python - TensorFlow CustomOp : multiprocessing not working for CPU，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57427277/

python - TensorFlow CustomOp : multiprocessing not working for CPU

上一篇：python - 如何使用 Python 从 ZIP 文件中查找字符串

下一篇：python - 如何处理 pandas 数据框中非常小的(-322 阶) float 值？