当我运行以下代码时,出现 ValueError:输入包含 NaN
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
rf.fit(train_features, train_labels);
我运行了以下命令,得到的结果表明不存在 NaN 或无限值,但不同的循环在 train_features 数组下显示它们
np.any(np.isnan(train_features))
我已经运行了下面的代码,但它并没有改变我收到的错误
train_features = np.nan_to_num(train_features)
train_labels = np.nan_to_num(train_labels)
请帮忙!
编辑:添加完整的相关代码:
features = pd.read_csv(x)
labels = np.array(features['Actuals'])
features = features.drop('Actuals', axis = 1)
feature_list = list(features.columns)
features = np.array(features)
from sklearn.model_selection import train_test_split
train_features, test_features, train_labels, test_labels = train_test_split(features, labels, test_size = 0.25, random_state = 42)
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
rf.fit(train_features, train_labels);
最佳答案
从我在您的代码中看到的情况来看,您只检查 nan
,而不是 inf
。使用 numpy 可能有更好的方法,但 pandas 方法应该可行:
with pd.option_context('mode.use_inf_as_na', True):
pd.DataFrame(train_features).isnull().sum() #Will show you which columns have nan or inf values
pd.DataFrame(train_labels).isnull().sum()
通过这个,您可以确定是否存在 nan
或 inf
值。然后你就可以fillna
。
关于python - 无法弄清楚如何清除随机森林中的 NaN,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59293576/