python - 防止 Imputer 丢失值

标签 python pandas dataframe regression imputation

目前我正在尝试用 pandas 估算因变量。 (不要问为什么。) 这是数据集

y.head(15)

Out[138]: 
0     13495.0
1     16500.0
2     16500.0
3     13950.0
4     17450.0
5     15250.0
6     17710.0
7     18920.0
8     23875.0
9         NaN
10    16430.0
11    16925.0
12    20970.0
13    21105.0
14    24565.0
Name: price, dtype: float64

如果我尝试估算这个变量，会发生一些奇怪的事情:

len(y) # 15

from sklearn.preprocessing import Imputer, 
mean_imputer_y = Imputer(strategy="mean", axis=0)
imputed_y = mean_imputer_y.fit_transform(y)

len(imputed_y) # 14

这显然与 Imputer 应该做的完全相反。我不想删除 NaN。我想归咎于他们。

对于这种行为是否有一些解释。我做错了什么？

感谢您的帮助!

最佳答案

您应该使用 axis=1 而不是 0 。

from sklearn.preprocessing import Imputer
mean_imputer_y = Imputer(strategy="mean", axis=1,missing_values=np.nan)

mean_imputer_y.fit_transform(df.Val)


array([[13495. , 16500. , 16500. , 13950. , 17450. , 15250. , 17710. ,
        18920. , 23875. , 18117.5, 16430. , 16925. , 20970. , 21105. ,
        24565. ]])

关于python - 防止 Imputer 丢失值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48235420/

上一篇：python - 如何在python中将两个不同维度的数组滚动成一维数组

下一篇：python - Open() 命令缓冲区手动缓冲区操作不起作用

相关文章：

python - 替换 pandas 数据框列中的一些字符串值，其值应为浮点型

dataframe - 有没有一种惯用的方法来缓存 Spark 数据帧？

java - 二十一点极小极大算法

javascript - 动态生成表时，如何使用 Python BeautifulSoup 对表信息进行 scape？

python - 如何在 pandas 数据框中连接多个文本字段

python - 使用 python 和 pandas 按季节分组数据

python - 从搜索结果页面 BeautifulSoup 中抓取所有 URL

Pandas Multiindex - 从列表中选择

python - 过滤掉 MultiIndex 数据框中具有零值的行/列

python - 使用 Pandas 从 xml url 读取单个节点