python - 对同一数据框中的分类和连续特征使用带有 fill_value 的 reindex

我在拟合和分类时使用 pandas.get_dummies 对分类特征进行编码，我刚刚注意到 Imputer() 将平均值放入“在对新样本进行分类时，在 dataframe.reindex() 中添加“off”分类开关。

我读了这个post建议在 reindex 调用中使用 fill_value=0，这似乎是一个不错的解决方案，但在将此代码投入生产之前，我有一个棘手的问题。

有谁知道 pandas DataFrame.reindex 函数是否会将所有 NaN 设置为 fill_value 中的值或仅设置它添加的新列？我想确保任何带有 NaN 的非分类数据都由 Imputer() 处理。

最佳答案

如果我正确理解你的问题，我相信它将在所有列中填充 NaN 值。

来自[ http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html][1]

import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10','Chrome']
df = pd.DataFrame({
      'http_status': [200,200,404,404,301],
      'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
       index=index)

df

                http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00

当df.reindex(new_index, fill_value='missing')返回时:

                  http_status   response_time
Safari                404          0.07
Iceweasel         missing       missing
Comodo Dragon     missing       missing
IE10                  404          0.08
Chrome                200          0.02

这些列都不是新的，但 nan 值仍然被填充。在投入生产之前，我肯定会测试我的解释。我不确定我是否有正确的上下文。

编辑:

我应该补充一点，似乎这些值之前是“NaN”，.reindex 将不会填充这些值:

import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10','Chrome']
df = pd.DataFrame({
      'http_status': [200,'NaN',404,404,301],
      'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
       index=index)

df

               http_status  response_time
Safari                404           0.07
Iceweasel             NaN            NaN
Comodo Dragon         NaN            NaN
IE10                  404           0.08
Chrome                NaN           0.02

当 df.reindex(new_index, fill_value='missing') 返回时:

              http_status response_time
Safari                404          0.07
Iceweasel         missing       missing
Comodo Dragon     missing       missing
IE10                  404          0.08
Chrome                NaN          0.02

切换索引不会影响 HTTP Status-Chrome 值。

关于python - 对同一数据框中的分类和连续特征使用带有 fill_value 的 reindex，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42167643/

python - 对同一数据框中的分类和连续特征使用带有 fill_value 的 reindex

上一篇：Python 复数零虚部格式化

下一篇：python - 如何在 SQLAlchemy 关系中从父级中删除子级