我有许多类似的数据帧,我想在所有数据帧中标准化 nans。例如,如果 nan 存在于 df1.loc[0,'a'] 中,那么对于相同的索引位置,所有其他数据帧都应设置为 nan。
我知道我可以将数据框分组以创建一个大型多索引数据框,但有时我发现使用一组相同结构的数据框更容易。
这是一个例子:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.reshape(np.arange(12), (4,3)), columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.reshape(np.arange(12), (4,3)), columns=['a', 'b', 'c'])
df3 = pd.DataFrame(np.reshape(np.arange(12), (4,3)), columns=['a', 'b', 'c'])
df1.loc[3,'a'] = np.nan
df2.loc[1,'b'] = np.nan
df3.loc[0,'c'] = np.nan
print df1
print ' '
print df2
print ' '
print df3
输出:
a b c
0 0.0 1 2
1 3.0 4 5
2 6.0 7 8
3 NaN 10 11
a b c
0 0 1.0 2
1 3 NaN 5
2 6 7.0 8
3 9 10.0 11
a b c
0 0 1 NaN
1 3 4 5.0
2 6 7 8.0
3 9 10 11.0
但是,我希望 df1、df2 和 df3 在相同位置有 nan:
print df1
a b c
0 0.0 1.0 NaN
1 3.0 NaN 5.0
2 6.0 7.0 8.0
3 NaN 10.0 11.0
使用 piRSquared 提供的答案,我能够将其扩展为不同大小的数据帧。这是函数:
def set_nans_over_every_df(df_list):
# Find unique index and column values
complete_index = sorted(set([idx for df in df_list for idx in df.index]))
complete_columns = sorted(set([idx for df in df_list for idx in df.columns]))
# Ensure that every df has the same indexes and columns
df_list = [df.reindex(index=complete_index, columns=complete_columns) for df in df_list]
# Find the nans in each df and set nans in every other df at the same location
mask = np.isnan(np.stack([df.values for df in df_list])).any(0)
df_list = [df.mask(mask) for df in df_list]
return df_list
以及使用不同大小的数据框的示例:
df1 = pd.DataFrame(np.reshape(np.arange(15), (5,3)), index=[0,1,2,3,4], columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.reshape(np.arange(12), (4,3)), index=[0,1,2,3], columns=['a', 'b', 'c'])
df3 = pd.DataFrame(np.reshape(np.arange(16), (4,4)), index=[0,1,2,3], columns=['a', 'b', 'c', 'd'])
df1.loc[3,'a'] = np.nan
df2.loc[1,'b'] = np.nan
df3.loc[0,'c'] = np.nan
df1, df2, df3 = set_nans_over_every_df([df1, df2, df3])
print df1
a b c d
0 0.0 1.0 NaN NaN
1 3.0 NaN 5.0 NaN
2 6.0 7.0 8.0 NaN
3 NaN 10.0 11.0 NaN
4 NaN NaN NaN NaN
最佳答案
我会在 numpy
中设置一个 mask
然后在 pd.DataFrame.mask
中使用这个 mask
> 方法
mask = np.isnan(np.stack([d.values for d in [df1, df2, df3]])).any(0)
print(df1.mask(mask))
a b c
0 0.0 1.0 NaN
1 3.0 NaN 5.0
2 6.0 7.0 8.0
3 NaN 10.0 11.0
print(df2.mask(mask))
a b c
0 0.0 1.0 NaN
1 3.0 NaN 5.0
2 6.0 7.0 8.0
3 NaN 10.0 11.0
print(df3.mask(mask))
a b c
0 0.0 1.0 NaN
1 3.0 NaN 5.0
2 6.0 7.0 8.0
3 NaN 10.0 11.0
关于python - 跨多个 pandas 数据帧设置 nans,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41703618/