python - 知道 pickle.dump 中的 "OSError: [Errno 22] Invalid argument"吗?

标签 python pickle dump

下面是我的代码: 在此代码中,我试图将“.p”文件拆分并规范化为具有不同规范的文件。但是,拆分似乎有效,但我无法使用 pickle.dump 将它们保存到“.p”文件中。对这个错误有什么建议吗?

import numpy as np
import pandas as pd
import pickle 
import gzip

# in this example tanh normalization is used
# fold 0 is used for testing and fold 1 for validation (hyperparamter    selection)
norm = 'tanh'
test_fold = 0
val_fold = 1

def normalize(X, means1=None, std1=None, means2=None, std2=None, feat_filt=None, norm='tanh_norm'):
if std1 is None:
    std1 = np.nanstd(X, axis=0)
if feat_filt is None:
    feat_filt = std1!=0
X = X[:,feat_filt]
X = np.ascontiguousarray(X)
if means1 is None:
    means1 = np.mean(X, axis=0)
X = (X-means1)/std1[feat_filt]
if norm == 'norm':
    return(X, means1, std1, feat_filt)
elif norm == 'tanh':
    return(np.tanh(X), means1, std1, feat_filt)
elif norm == 'tanh_norm':
    X = np.tanh(X)
    if means2 is None:
        means2 = np.mean(X, axis=0)
    if std2 is None:
        std2 = np.std(X, axis=0)
    X = (X-means2)/std2
    return(X, means1, std1, means2, std2, feat_filt)

#contains the data in both feature ordering ways (drug A - drug B - cell line     and drug B - drug A - cell line)
#in the first half of the data the features are ordered (drug A - drug B - cell line)
#in the second half of the data the features are ordered (drug B - drug A - cell line)
file ='X.p.gz', 'rb')
X = pickle.load(file)

#contains synergy values and fold split (numbers 0-4)
labels = pd.read_csv('labels.csv', index_col=0) 
#labels are duplicated for the two different ways of ordering in the data
labels = pd.concat([labels, labels])

#indices of training data for hyperparameter selection: fold 2, 3, 4
idx_tr = np.where(np.logical_and(labels['fold']!=test_fold,            labels['fold']!=val_fold))
#indices of validation data for hyperparameter selection: fold 1
idx_val = np.where(labels['fold']==val_fold)

#indices of training data for model testing: fold 1, 2, 3, 4
idx_train = np.where(labels['fold']!=test_fold)
#indices of test data for model testing: fold 0
idx_test = np.where(labels['fold']==test_fold)

X_tr = X[idx_tr]
X_val = X[idx_val]
X_train = X[idx_train]
X_test = X[idx_test]

y_tr = labels.iloc[idx_tr]['synergy'].values
y_val = labels.iloc[idx_val]['synergy'].values
y_train = labels.iloc[idx_train]['synergy'].values
y_test = labels.iloc[idx_test]['synergy'].values

if norm == "tanh_norm":
    X_tr, mean, std, mean2, std2, feat_filt = normalize(X_tr, norm=norm)
    X_val, mean, std, mean2, std2, feat_filt = normalize(X_val, mean, std, mean2, std2, 
                                                      feat_filt=feat_filt, norm=norm)
X_tr, mean, std, feat_filt = normalize(X_tr, norm=norm)
X_val, mean, std, feat_filt = normalize(X_val, mean, std, feat_filt=feat_filt, norm=norm)

if norm == "tanh_norm":
X_train, mean, std, mean2, std2, feat_filt = normalize(X_train, norm=norm)
X_test, mean, std, mean2, std2, feat_filt = normalize(X_test, mean, std, mean2, std2, 
                                                      feat_filt=feat_filt, norm=norm)
X_train, mean, std, feat_filt = normalize(X_train, norm=norm)
X_test, mean, std, feat_filt = normalize(X_test, mean, std, feat_filt=feat_filt, norm=norm)

pickle.dump((X_tr, X_val, X_train, X_test, y_tr, y_val, y_train, y_test),    open('data_test_fold%d_%s.p'%(test_fold, norm), 'wb'))



这很可能是由于 Pickle 实现中的错误导致的,该错误不允许生成大于 4GB 的文件。

Python 3 - Can pickle handle byte objects larger than 4GB?

关于python - 知道 pickle.dump 中的 "OSError: [Errno 22] Invalid argument"吗?,我们在Stack Overflow上找到一个类似的问题:


python - pickle `persistent_id` 的替代品?

mysql - 从 Oracle 迁移到 MySQL

xml - 使用 XSLT 转储文件以获取源 XML

python - 扭曲的服务器 TLS 卡在连接上

python - django file upload - 在编辑模板中隐藏当前显示的图片链接

python - 有没有办法在 python pickle 中保存多个变量?

ios - GDB 转储内存 .bin 文件为空

python - Windows 上的 virtualenv : not over-riding installed package

python - 一点一点地阅读霍夫曼压缩

python - pickle UnicodeDecodeError