python - 并行运行 CrossValidationCV

标签 python parallel-processing scikit-learn cross-validation

当我并行运行 GridsearchCV()RandomizedsearchCV() 方法时(具有 n_jobs>1n_jobs=-1 选项集)
它显示此消息:

ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using "if name == 'main'". Please see the joblib documentation on Parallel for more information" I put the code in a class in .py file and call it using if_name_=='main in other .py file but it still shows this message

n_jobs=1

时效果很好
import platform; print(platform.platform())
Windows-10-10.0.10586-SP0
import numpy; print("NumPy", numpy.__version__)

NumPy 1.13.1

import scipy; print("SciPy", scipy.__version__)

SciPy 0.19.1

 import sklearn; print("Scikit-Learn", sklearn.__version__)

Scikit-Learn 0.19.0


更新

我试过这段代码,但它仍然给我同样的错误

import numpy as np
from sklearn.model_selection import RandomizedSearchCV
from sklearn.tree import DecisionTreeClassifier

class Test():
   def __init__(self):
          attributes = [..]
          dataset = pd.read_csv("..")
          X=dataset[[..]] 
          Y=dataset[...]
          model=DecisionTreeClassifier()
          model = RandomizedSearchCV(....)
          model.fit(X, Y)          
if __name__ == '__main__':
   Test()

最佳答案

joblib 以这种行为而闻名,并且在文档中相当明确:

Warning

Under Windows, it is important to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.Parallel. In other words, you should be writing code like this:

import ....

def function1(...):
    ...

def function2(...):
    ...

...
if __name__ == '__main__':
    # do stuff with imports and functions defined about
    ...

No code should run outside of the “if __name__ == ‘__main__’” blocks, only imports and definitions.

因此,重构您的代码以满足这一明确定义的要求,您的代码将开始受益于 joblib-tools 的强大功能。

关于python - 并行运行 CrossValidationCV,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48631907/

相关文章:

python - 计算 Keras 中的自定义损失,预测值和真实值的形状不同

java - Stream 在顺序模式下工作,但在并行模式下有问题

r - 为什么这个 for 循环不能并行工作?

scikit-learn roc_curve : why does it return a threshold value = 2 some time?

python - 准确度分值Error : Can't Handle mix of binary and continuous target

python - 每当文本超过可视区域时,如何使文本自动向下滚动?

python - 拆分为 n 个字符串时返回字符串的所有可能组合

python - 如何高效地弹出heapq中具有最小键的所有元素?

haskell - 编写并行 `` zip `` using ` `Control.Parallel.Strategies``

python - 如何采用 sklearn post-cross_val_predict 模型对另一个缩放数据集进行预测?以及模型是否可以序列化?