我想替换 strings
列中的位置词:如果它们单独存在或存在多个,但用 、
和 space
连接.
id strings
0 1 south
1 2 north
2 3 east
3 4 west
4 5 west, east, south
5 6 west, west
6 7 north, north
7 8 north, south
8 9 West Corporation global office
9 10 West-Riding
10 11 University of West Florida
11 12 Southwest
我的预期结果会是这样的。请注意,如果它们是短语或单词的组成部分,那么我不需要替换它们。
可以这样做吗?谢谢。
id strings
0 1 NaN
1 2 NaN
2 3 NaN
3 4 NaN
4 5 NaN
5 6 NaN
6 7 NaN
7 8 NaN
8 9 West Corporation global office
9 10 West-Riding
10 11 University of West Florida
11 12 Southwest
下面的代码可以工作,但我只是想知道是否有一些更简洁的方法?
df['strings'].astype(str).replace('south', np.nan).replace('north', np.nan)\
.replace('west', np.nan).replace('east', np.nan).replace('west, east', np.nan)\
.replace('west, west', np.nan).replace('north, north', np.nan).replace('west, east', np.nan)\
.replace('north, south', np.nan)
最佳答案
首次使用Series.str.split
,前向填充替换缺失值,测试是否所有匹配值均按DataFrame.isin
和 DataFrame.all
对于掩码和最后设置的缺失值 Series.mask
:
L = ['south','north','east','west']
m = df['strings'].str.split(', ', expand=True).ffill(axis=1).isin(L).all(axis=1)
df['strings'] = df['strings'].mask(m)
print (df)
id strings
0 1 NaN
1 2 NaN
2 3 NaN
3 4 NaN
4 5 NaN
5 6 NaN
6 7 NaN
7 8 NaN
8 9 West Corporation global office
9 10 West-Riding
10 11 University of West Florida
11 12 Southwest
另一个关于set
的想法,isdisjoint
和Series.where
:
m = [set(x.split(', ')).isdisjoint(L) for x in df['strings']]
df['strings'] = df['strings'].where(m)
print (df)
id strings
0 1 NaN
1 2 NaN
2 3 NaN
3 4 NaN
4 5 NaN
5 6 NaN
6 7 NaN
7 8 NaN
8 9 West Corporation global office
9 10 West-Riding
10 11 University of West Florida
11 12 Southwest
关于python - 在 Python 中用 NaN 替换一列中的多个字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60110807/