python - 更换双管||在 Pandas 或 Python 中

正在处理一些使用“||”的深思熟虑的数据作为单个字符串内的分隔符。我有一个超过 60 个工作表和 100k 个单独记录的 Excel 文件，其中包含这些“||”利益分离。例如:

email          interests  
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="254c4b434a65514056510b464a48" rel="noreferrer noopener nofollow">[email protected]</a>  Sports||IT||Business||Other

我尝试使用以下代码来替换管道，但它似乎不起作用..管道是否被视为特殊字符？谷歌搜索没有为我提供任何 Python 特定结果。

import pandas as pd
df = pd.read_excel("test.xlsx")
df["interests"] = df["interests"].replace('||', ' , ')

出于某种原因使用 str.replace 只会在每个字符之间添加大量逗号

任何帮助将不胜感激!

最佳答案

Series.replace(..., regex=False, ...)默认情况下使用 regex=False，这意味着它将尝试替换整个单元格值。

演示:

In [25]: df = pd.DataFrame({'col':['ab ab', 'ab']})

In [26]: df
Out[26]:
     col
0  ab ab
1     ab

In [27]: df['col'].replace('ab', 'XXX')
Out[27]:
0    ab ab        # <--- NOTE!
1      XXX
Name: col, dtype: object

In [28]: df['col'].replace('ab', 'ZZZ', regex=True)
Out[28]:
0    ZZZ ZZZ
1        ZZZ
Name: col, dtype: object

所以不要忘记使用regex=True参数:

In [23]: df["interests"] = df["interests"].replace('\|\|', ' , ', regex=True)

In [24]: df
Out[24]:
           email                       interests
0  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e0898e868fa094859394ce838f8d" rel="noreferrer noopener nofollow">[email protected]</a>  Sports , IT , Business , Other

或使用Series.str.replace()它始终将其视为正则表达式:

df["interests"] = df["interests"].str.replace('\|\|', ' , ')

除此之外的PS | is a special RegEx symbol ，这意味着OR，所以我们需要用反斜杠字符来转义

关于python - 更换双管||在 Pandas 或 Python 中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49495983/

python - 更换双管||在 Pandas 或 Python 中

上一篇：python - 导入错误 : The _imagingft C module is not installed in alpine-docker

下一篇：Python - 从列表创建连接条件