python - 使用 Tensorflow Federated 时为 "RuntimeError: No default context installed. "

标签 python tensorflow runtime-error tensorflow-federated

目前,我正在使用 TensorFlow Federated 开展联合学习项目。 当我收到此错误时,我正在向服务器发出请求以检查我的代码是否正常工作:

    RuntimeError: No default context installed.
    
    You should not expect to get this error using the TFF API.

但是,我只在某些特定条件下遇到它。

场景是这样的(所有代码如下):

从网站发出 http 请求。 routes/developers.py 中的函数 upload_and_train 处理该请求。在此内部,调用 start_processing 函数来启动训练预处理(收集训练数据、初始化超参数等)。最后,调用 federated_computation_new 函数(这也是它崩溃的地方),开始联邦学习。 当它到达调用时崩溃:iterative_process.initialize()

iterative_process = tff.learning.build_federated_averaging_process(model_fn,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))
state = iterative_process.initialize()

令人困惑的部分如下。如果我在本地运行代码,一切都会顺利,训练过程正在运行;没有错误。如果我在服务器上运行它,它也适用于发出的第一个请求。然后它崩溃并在所有以下请求上返回相同的错误(在下面详细说明),直到我重新启动服务器。然后它在第一次调用时再次完美运行,并在后续调用中崩溃。

这个问题让我抓狂,我无法弄清楚。我唯一剩下的想法是,第一次调用后发生了一些事情(进程未关闭或类似的事情),并且在后续调用中它没有得到"new"开始?尽管它一开始就不应该发生。

完整错误消息如下:

    143.205.173.225 - - [12/Oct/2021 13:18:05] "[35m[1mPOST /api/Developers/use_cases/text_processing/developer_id/3/upload_and_train HTTP/1.1[0m" 500 -
INFO:werkzeug:143.205.173.225 - - [12/Oct/2021 13:18:05] "[35m[1mPOST /api/Developers/use_cases/text_processing/developer_id/3/upload_and_train HTTP/1.1[0m" 500 -
 doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
ERROR:main:Exception on /api/Developers/use_cases/text_processing/developer_id/4/upload_and_train [POST]
Traceback (most recent call last):
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/decorator.py", line 48, in wrapper
    response = function(request)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/uri_parsing.py", line 144, in wrapper
    response = function(request)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/validation.py", line 384, in wrapper
    return function(request)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/parameter.py", line 121, in wrapper
    return function(**kwargs)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/routes/developers.py", line 46, in upload_and_train
    last_train_metrics = main_proc.start_processing(use_case,developer_id)
  File "processing/text_processing/main_proc.py", line 17, in start_processing
    state,metrics = federated_computation_new(train_dataset,test_dataset)
  File "processing/text_processing/federated_algorithm.py", line 29, in federated_computation_new
    state = iterative_process.initialize()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/utils/function_utils.py", line 521, in __call__
    return context.invoke(self, arg)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/context_stack/runtime_error_context.py", line 41, in invoke
    self._raise_runtime_error()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/context_stack/runtime_error_context.py", line 23, in _raise_runtime_error
    raise RuntimeError(
RuntimeError: No default context installed.

You should not expect to get this error using the TFF API.

If you are getting this error when testing a module inside of `tensorflow_federated/python/core/...`, you may need to explicitly invoke `execution_contexts.set_local_execution_context()` in the `main` function of your test.

第一个处理传入请求的函数。 该请求包含 4 个参数:2 个标识符“use_case”和“developer_”id”以及 2 个包含训练数据的 formData 文件,该文件存储在本地。

def upload_and_train(use_case: str, developer_id: int):


    use_case_path = 'processing/'+use_case+'/'
    sys.path.append(use_case_path)
    import main_proc

    app_path = dirname(dirname(abspath(__file__)))
    file_dict = request.files
    db_File_True = file_dict["dataset_file1"]
    db_File_Fake = file_dict["dataset_file2"]
    true_csv_path = os.path.join(app_path+"/"+use_case_path+"db/", "True.csv")
    fake_csv_path = os.path.join(app_path+"/"+use_case_path+"db/", "Fake.csv")
    db_File_True.save(true_csv_path)
    db_File_Fake.save(fake_csv_path)
    time.sleep(5) #wait for the files to be copied before proceeding
    #THEN start processing
    last_train_metrics = main_proc.start_processing(use_case,developer_id) # <============== GOES INTO HERE & CRASHES
    metricsJson = trainMetricsToJSON(last_train_metrics)    

    return Response(status=200, response=metricsJson)

启动预处理的函数:

def start_processing(use_case, developer_id:int = 0):
    globals.initialize(use_case,developer_id)
    globals.TRAINER_ID = developer_id
    
    
    train_dataset, test_dataset= get_preprocessed_train_test_data()

    state,metrics = federated_computation_new(train_dataset,test_dataset) # <============== GOES INTO HERE & CRASHES  
    trained_metrics= metrics['train']
    
    timestamp = int(time.time())
    globals.DATASET_ID = timestamp
    
    written_row = save_to_file_CSV(use_case,globals.TRAINER_ID,timestamp,globals.DATASET_ID,trained_metrics['sparse_categorical_accuracy'],trained_metrics['loss'])
    return written_row

正在进行联合训练的函数:

def federated_computation_new(train_dataset,test_dataset):

    # Training and evaluating the model
    iterative_process = tff.learning.build_federated_averaging_process(model_fn,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))
    state = iterative_process.initialize() # <============== CRASHES HERE

    print(type(state))

    for n in range(globals.EPOCHS):
        state, metrics = iterative_process.next(state, train_dataset)
        print('round  {}, training metrics={}'.format(n+1, metrics))

    evaluation = tff.learning.build_federated_evaluation(model_fn)
    eval_metrics = evaluation(state.model, train_dataset)
    print('Training evaluation metrics={}'.format(eval_metrics))

    test_metrics = evaluation(state.model, test_dataset)
    print('Test evaluation metrics={}'.format(test_metrics))
    #############################################################################################
    #Save Last Trained Model
    import pickle
    with open("processing/"+globals.USE_CASE+"/last_model",'wb') as f:
        pickle.dump(state, f)
    return state,metrics
def model_fn():
  keras_model = get_simple_LSTM_model()

  return tff.learning.from_keras_model(
      keras_model,
      input_spec=globals.INPUT_SPEC,
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

函数:/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/utils/function_utils.py “,第 521 行,

def __call__(self, *args, **kwargs):
    context = self._context_stack.current
    arg = pack_args(self._type_signature.parameter, args, kwargs, context)
    return context.invoke(self, arg) # <============== This returns the runtime Error

预先非常感谢您的时间和耐心。

最佳答案

我认为我们可以指出“应该”防止这种情况的机制,并给出解决方法 - 但至于诊断根本原因,目前我只有猜测。

当您运行导入tensorflow_federated as tff时,this line应该执行,将执行上下文安装在全局上下文堆栈的基础上,TFF 使用该全局上下文堆栈来管理 __call__ 的含义。正是这个上下文堆栈由 function_utils.py 中的 __call__ 实现委托(delegate)给。

在执行此行之前,有一个“默认”RuntimeErrorContext安装在堆栈的底部,当任何人尝试针对此上下文调用任何内容时,它都会抛出异常(就此而言,将某些内容摄取到此上下文中也会引发,但是您无法调用无参数计算,因此无需摄取参数)。

因此,我认为这里的一种可能性是,此代码只是没有运行 TFF 用于安装上下文的 __init__.py 文件。从代码片段来看,这对我来说并不明显,但我认为这是可能的。

在我们尝试进一步诊断此问题时,我们可以为您提供合理的解决方法。如果在您的 federated_computation_new 函数中调用 tff.backends.native.set_local_python_execution_context() (或 set_local_execution_context,具体取决于您的 TFF 版本)此错误应该会自行解决。

关于python - 使用 Tensorflow Federated 时为 "RuntimeError: No default context installed. ",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69542184/

相关文章:

python - 有没有办法在 python 中将 bincount 与子句一起使用?

python - 如何在 Keras 中的每批之后更新训练日志输出?

node.js - Tensorflow.js inputShape 与模型输入不匹配

swift - AVPlayerItemVideoOutput.copyPixelBuffer 因 EXC_BAD_ACCESS 失败

python - pandas.merge 莫名其妙的慢

python - 如何从 fabric def 返回

python - Pandas 合并101

tensorflow - 如何将 onnx 模型转换为 tensorflow 保存的模型?

excel - vba代码中的多个工作表对象错误

java - 找出java中LSD计数排序程序中出现java.lang.NoClassDefFoundError的原因