我读取了一个 csv 文件并将其转换为包含 2 个文本列的 pandas 数据框。在一列中,我有多行这种形式:
<suggested-actions-list text =""is this a test?"">suggested-
action>Yes</suggested-action><suggested-action>No</suggested-action>
</suggested-actions-list>"
<choice-list text=""some text""> <choice-option>option1</choice-option>
<choice-option>option2</choice-option> <choice-option>option3</choice-
option></choice-list>
我想选择尖括号之间的文本,以便最终得到如下结果:
""is this a test?"" Yes No
""some text"" option1 option2 option3
有人可以给点提示吗?谢谢!
最佳答案
s = """
<suggested-actions-list text =""is this a test?""><suggested-action>Yes</suggested-action><suggested-action>No</suggested-action></suggested-actions-list>
<choice-list text=""some text""> <choice-option>option1</choice-option><choice-option>option2</choice-option> <choice-option>option3</choice-option></choice-list>
"""
x = re.sub('<(?:.*?)("".*"")?>', r'\1 ', s)
x = re.sub('[ ]+', ' ', x)
print(x)
输出:
""is this a test?"" Yes No
""some text"" option1 option2 option3
注意:我必须对原始文本进行一些修改,即添加 <
在第一个“建议操作”之前并删除 "
在第一个元素的末尾。如果这不对,请告诉我,我们也需要在代码中修复这个问题
关于 python : Select text between angle brackets,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55040908/