如果我有以下数据框,我想通过将多个字符串和数字替换为 NaN
来清理数据s:即。 68, Tardeo Road
和0
来自state
, 567
来自dept
,和#ERROR!
和123
来自phonenumber
:
id state dept \
0 1 Abu Dhabi {Marketing}
1 2 MO {Other}
2 3 68, Tardeo Road {"Human Resources"}
3 4 National Capital Territory of Delhi {"Human Resources"}
4 5 Aargau Canton {Marketing}
5 6 Aargau Canton 567
6 18 NB {"Finance & Administration"}
7 19 0 {Sales}
8 20 Abu Dhabi {"Human Resources"}
9 21 Aargau {"Finance & Administration"}
phonenumber
0 123
1 5635888000
2 18006708450
3 #ERROR!
4 12032722596
5 18003928343
6 NaN
7 #ERROR!
8 NaN
9 NaN
我尝试过以下代码:
解决方案1:
mask = (df.state == '0') | (df.state == '68, Tardeo Road')
df.loc[mask, ['state']] = np.nan
解决方案2:
df.loc[(df.state == '68, Tardeo Road') | (df.state == 0), 'state'] = np.nan
解决方案3:
df.loc[df.state == '0', 'state'] = np.nan
df.loc[df.state == '68, Tardeo Road', 'state'] = np.nan
所有这些都有效,但如果我将它们应用到多个列,那就有点长了。
只是想知道是否可以使其更加简洁和高效?通过使用str.replace
例如。谢谢。
最佳答案
您可以进行替换:
df = df.replace({'state':['68, Tardeo Road','0'],
'dept':['567'],
'phonenumber':['#ERROR!','123']}, np.nan)
输出:
id state dept phonenumber
-- ---- ----------------------------------- ---------------------------- -------------
0 1 Abu Dhabi {Marketing} nan
1 2 MO {Other} 5635888000
2 3 nan {"Human Resources"} 18006708450
3 4 National Capital Territory of Delhi {"Human Resources"} nan
4 5 Aargau Canton {Marketing} 12032722596
5 6 Aargau Canton nan 18003928343
6 18 NB {"Finance & Administration"} nan
7 19 nan {Sales} nan
8 20 Abu Dhabi {"Human Resources"} nan
9 21 Aargau {"Finance & Administration"} nan
关于python - 在 Pandas 中用 NaN 替换多列中的多个字符串和数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62144128/