python - 通过 DataFrame 设置 pandas DataFrame 的子集

我觉得这个问题之前已经被问过数百万次了，但我似乎无法让它发挥作用或找到回答我问题的 SO-post。

所以我选择了 pandas DataFrame 的一个子集并想单独更改这些值。

我正在像这样子选择我的 DataFrame:

df.loc[df[key].isnull(), [keys]]

效果很好。如果我尝试将所有值设置为相同的值，例如

df.loc[df[key].isnull(), [keys]] = 5

同样有效。但是如果我尝试将它设置为 DataFrame 它不会，但是也不会产生错误。

例如我有一个 DataFrame:

data = [['Alex',10,0,0,2],['Bob',12,0,0,1],['Clarke',13,0,0,4],['Dennis',64,2],['Jennifer',56,1],['Tom',95,5],['Ellen',42,2],['Heather',31,3]]
df1 = pd.DataFrame(data,columns=['Name','Age','Amount_of_cars','cars_per_year','some_other_value'])

       Name  Age  Amount_of_cars  cars_per_year  some_other_value
0      Alex   10               0            0.0               2.0
1       Bob   12               0            0.0               1.0
2    Clarke   13               0            0.0               4.0
3    Dennis   64               2            NaN               NaN
4  Jennifer   56               1            NaN               NaN
5       Tom   95               5            NaN               NaN
6     Ellen   42               2            NaN               NaN
7   Heather   31               3            NaN               NaN

和第二个 DataFrame:

data = [[2/64,5],[1/56,1],[5/95,7],[2/42,5],[3/31,7]]
df2 = pd.DataFrame(data,columns=['cars_per_year','some_other_value'])

   cars_per_year  some_other_value
0       0.031250                 5
1       0.017857                 1
2       0.052632                 7
3       0.047619                 5
4       0.096774                 7

我想用第二个 DataFrame 替换那些 nans

df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2

不幸的是，这不起作用，因为索引不匹配。那么如何在设置值时忽略索引呢？

如有任何帮助，我们将不胜感激。抱歉，如果之前已经发布过。

最佳答案

只有当 mising values 的数量与 df2 中的行数相同时才有可能，然后分配数组以防止索引对齐:

df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2.values
print (df1)
       Name  Age  Amount_of_cars  cars_per_year  some_other_value
0      Alex   10               0       0.000000               2.0
1       Bob   12               0       0.000000               1.0
2    Clarke   13               0       0.000000               4.0
3    Dennis   64               2       0.031250               5.0
4  Jennifer   56               1       0.017857               1.0
5       Tom   95               5       0.052632               7.0
6     Ellen   42               2       0.047619               5.0
7   Heather   31               3       0.096774               7.0

如果不是，得到如下错误:

#4 rows assigned to 5 rows
data = [[2/64,5],[1/56,1],[5/95,7],[2/42,5]]
df2 = pd.DataFrame(data,columns=['cars_per_year','some_other_value'])

df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2.values

ValueError: shape mismatch: value array of shape (4,) could not be broadcast to indexing result of shape (5,)

另一个想法是根据 df1 中筛选行的索引设置 df2 的索引:

df2 = df2.set_index(df1.index[df1['cars_per_year'].isnull()])
df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2
print (df1)
       Name  Age  Amount_of_cars  cars_per_year  some_other_value
0      Alex   10               0       0.000000               2.0
1       Bob   12               0       0.000000               1.0
2    Clarke   13               0       0.000000               4.0
3    Dennis   64               2       0.031250               5.0
4  Jennifer   56               1       0.017857               1.0
5       Tom   95               5       0.052632               7.0
6     Ellen   42               2       0.047619               5.0
7   Heather   31               3       0.096774               7.0

关于python - 通过 DataFrame 设置 pandas DataFrame 的子集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58763795/

python - 通过 DataFrame 设置 pandas DataFrame 的子集

上一篇：python - 如何显示django类别名称而不是Category对象(一)

下一篇：python - 如何从Python字典的最高值中随机打破平局？