目前,我正在使用 TensorFlow Federated 开展联合学习项目。 当我收到此错误时,我正在向服务器发出请求以检查我的代码是否正常工作:
RuntimeError: No default context installed.
You should not expect to get this error using the TFF API.
但是,我只在某些特定条件下遇到它。
场景是这样的(所有代码如下):
从网站发出 http 请求。 routes/developers.py 中的函数 upload_and_train 处理该请求。在此内部,调用 start_processing 函数来启动训练预处理(收集训练数据、初始化超参数等)。最后,调用 federated_computation_new 函数(这也是它崩溃的地方),开始联邦学习。 当它到达调用时崩溃:iterative_process.initialize()。
iterative_process = tff.learning.build_federated_averaging_process(model_fn,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))
state = iterative_process.initialize()
令人困惑的部分如下。如果我在本地运行代码,一切都会顺利,训练过程正在运行;没有错误。如果我在服务器上运行它,它也适用于发出的第一个请求。然后它崩溃并在所有以下请求上返回相同的错误(在下面详细说明),直到我重新启动服务器。然后它在第一次调用时再次完美运行,并在后续调用中崩溃。
这个问题让我抓狂,我无法弄清楚。我唯一剩下的想法是,第一次调用后发生了一些事情(进程未关闭或类似的事情),并且在后续调用中它没有得到"new"开始?尽管它一开始就不应该发生。
完整错误消息如下:
143.205.173.225 - - [12/Oct/2021 13:18:05] "[35m[1mPOST /api/Developers/use_cases/text_processing/developer_id/3/upload_and_train HTTP/1.1[0m" 500 -
INFO:werkzeug:143.205.173.225 - - [12/Oct/2021 13:18:05] "[35m[1mPOST /api/Developers/use_cases/text_processing/developer_id/3/upload_and_train HTTP/1.1[0m" 500 -
doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
ERROR:main:Exception on /api/Developers/use_cases/text_processing/developer_id/4/upload_and_train [POST]
Traceback (most recent call last):
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/decorator.py", line 48, in wrapper
response = function(request)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/uri_parsing.py", line 144, in wrapper
response = function(request)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/validation.py", line 384, in wrapper
return function(request)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/parameter.py", line 121, in wrapper
return function(**kwargs)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/routes/developers.py", line 46, in upload_and_train
last_train_metrics = main_proc.start_processing(use_case,developer_id)
File "processing/text_processing/main_proc.py", line 17, in start_processing
state,metrics = federated_computation_new(train_dataset,test_dataset)
File "processing/text_processing/federated_algorithm.py", line 29, in federated_computation_new
state = iterative_process.initialize()
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/utils/function_utils.py", line 521, in __call__
return context.invoke(self, arg)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/context_stack/runtime_error_context.py", line 41, in invoke
self._raise_runtime_error()
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/context_stack/runtime_error_context.py", line 23, in _raise_runtime_error
raise RuntimeError(
RuntimeError: No default context installed.
You should not expect to get this error using the TFF API.
If you are getting this error when testing a module inside of `tensorflow_federated/python/core/...`, you may need to explicitly invoke `execution_contexts.set_local_execution_context()` in the `main` function of your test.
第一个处理传入请求的函数。 该请求包含 4 个参数:2 个标识符“use_case”和“developer_”id”以及 2 个包含训练数据的 formData 文件,该文件存储在本地。
def upload_and_train(use_case: str, developer_id: int):
use_case_path = 'processing/'+use_case+'/'
sys.path.append(use_case_path)
import main_proc
app_path = dirname(dirname(abspath(__file__)))
file_dict = request.files
db_File_True = file_dict["dataset_file1"]
db_File_Fake = file_dict["dataset_file2"]
true_csv_path = os.path.join(app_path+"/"+use_case_path+"db/", "True.csv")
fake_csv_path = os.path.join(app_path+"/"+use_case_path+"db/", "Fake.csv")
db_File_True.save(true_csv_path)
db_File_Fake.save(fake_csv_path)
time.sleep(5) #wait for the files to be copied before proceeding
#THEN start processing
last_train_metrics = main_proc.start_processing(use_case,developer_id) # <============== GOES INTO HERE & CRASHES
metricsJson = trainMetricsToJSON(last_train_metrics)
return Response(status=200, response=metricsJson)
启动预处理的函数:
def start_processing(use_case, developer_id:int = 0):
globals.initialize(use_case,developer_id)
globals.TRAINER_ID = developer_id
train_dataset, test_dataset= get_preprocessed_train_test_data()
state,metrics = federated_computation_new(train_dataset,test_dataset) # <============== GOES INTO HERE & CRASHES
trained_metrics= metrics['train']
timestamp = int(time.time())
globals.DATASET_ID = timestamp
written_row = save_to_file_CSV(use_case,globals.TRAINER_ID,timestamp,globals.DATASET_ID,trained_metrics['sparse_categorical_accuracy'],trained_metrics['loss'])
return written_row
正在进行联合训练的函数:
def federated_computation_new(train_dataset,test_dataset):
# Training and evaluating the model
iterative_process = tff.learning.build_federated_averaging_process(model_fn,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))
state = iterative_process.initialize() # <============== CRASHES HERE
print(type(state))
for n in range(globals.EPOCHS):
state, metrics = iterative_process.next(state, train_dataset)
print('round {}, training metrics={}'.format(n+1, metrics))
evaluation = tff.learning.build_federated_evaluation(model_fn)
eval_metrics = evaluation(state.model, train_dataset)
print('Training evaluation metrics={}'.format(eval_metrics))
test_metrics = evaluation(state.model, test_dataset)
print('Test evaluation metrics={}'.format(test_metrics))
#############################################################################################
#Save Last Trained Model
import pickle
with open("processing/"+globals.USE_CASE+"/last_model",'wb') as f:
pickle.dump(state, f)
return state,metrics
def model_fn():
keras_model = get_simple_LSTM_model()
return tff.learning.from_keras_model(
keras_model,
input_spec=globals.INPUT_SPEC,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
函数:/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/utils/function_utils.py “,第 521 行,
def __call__(self, *args, **kwargs):
context = self._context_stack.current
arg = pack_args(self._type_signature.parameter, args, kwargs, context)
return context.invoke(self, arg) # <============== This returns the runtime Error
预先非常感谢您的时间和耐心。
最佳答案
我认为我们可以指出“应该”防止这种情况的机制,并给出解决方法 - 但至于诊断根本原因,目前我只有猜测。
当您运行导入tensorflow_federated as tff
时,this line应该执行,将执行上下文安装在全局上下文堆栈的基础上,TFF 使用该全局上下文堆栈来管理 __call__
的含义。正是这个上下文堆栈由 function_utils.py
中的 __call__
实现委托(delegate)给。
在执行此行之前,有一个“默认”RuntimeErrorContext
安装在堆栈的底部,当任何人尝试针对此上下文调用
任何内容时,它都会抛出异常(就此而言,将某些内容摄取
到此上下文中也会引发,但是您无法调用无参数计算,因此无需摄取参数)。
因此,我认为这里的一种可能性是,此代码只是没有运行 TFF 用于安装上下文的 __init__.py
文件。从代码片段来看,这对我来说并不明显,但我认为这是可能的。
在我们尝试进一步诊断此问题时,我们可以为您提供合理的解决方法。如果在您的 federated_computation_new
函数中调用 tff.backends.native.set_local_python_execution_context()
(或 set_local_execution_context
,具体取决于您的 TFF 版本)此错误应该会自行解决。
关于python - 使用 Tensorflow Federated 时为 "RuntimeError: No default context installed. ",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69542184/