我有两个数据框:
df
:
id string_data
1 My name is Jeff
2 Hello, I am John
3 I like Brad he is cool.
另一个名为 allnames
的数据框包含如下名称列表:
id name
1 Jeff
2 Brad
3 John
4 Emily
5 Ross
我想用 "Firstname"
替换 df
中出现在 allnames['name']
中的所有单词
预期输出:
id string_data
1 My name is Firstname
2 Hello, I am Firstname
3 I like Firstname he is cool.
我试过这个:
nameList = '|'.join(allnames['name'])
df['string_data'].str.replace(nameList, "FirstName", case = False))
但是它几乎替换了99%的词
最佳答案
如果将单词边界添加到 Series.str.replace
,您的解决方案应该有效:
nameList = '|'.join(r"\b{}\b".format(x) for x in allnames['name'])
df['string_data'] = df['string_data'].str.replace(nameList, "FirstName", case = False)
print (df)
id string_data
0 1 My name is FirstName
1 2 Hello, I am FirstName
2 3 I like FirstName he is cool.
或者通过字典用get
和join
替换值:
d = dict.fromkeys(allnames['name'], 'Firstname')
f = lambda x: ' '.join(d.get(y, y) for y in x.split())
df['string_data'] = df['string_data'].apply(f)
print (df)
id string_data
0 1 My name is Firstname
1 2 Hello, I am Firstname
2 3 I like Firstname he is cool.
编辑:您可以通过 lower
将所有值转换为小写:
d = dict.fromkeys([x.lower() for x in allnames['name']], 'Firstname')
f = lambda x: ' '.join(d.get(y.lower(), y) for y in x.split())
df['string_data'] = df['string_data'].apply(f)
关于python - 如何使用 Pandas python 中的另一个数据框替换数据框中的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56054778/