python - 如何在Python文本字符串中找到省略号？

这里对 Python(以及 Stack Overflow!)还算陌生。我有一个包含主题行数据(文本字符串)的数据集，我正在使用它构建词袋模型。我正在创建新变量，为各种可能的情况标记 0 或 1，但我一直试图确定文本中哪里有省略号(“...”)。我从这里开始:

Data_Frame['Elipses'] = Data_Frame.Subject_Line.str.match('(\w+)\.{2,}(.+)')

由于明显的原因，输入 ('...') 不起作用，但建议使用上面的正则表达式代码 - 仍然不起作用。也尝试过这个:

Data_Frame['Elipses'] = Data_Frame.Subject_Line.str.match('.\.\.\')

没有骰子。

上面的代码 shell 适用于我创建的其他变量，但我也无法创建 0-1 输出而不是 True/False (将是 R 中的“as.numeric”参数。)任何帮助在此也将不胜感激。

谢谢!

最佳答案

使用 search() 而不是 match() 会在文本中的任何位置发现省略号。在 Pandas 中 str.contains() 支持正则表达式:

例如在 Pandas 中:

import pandas as pd

df = pd.DataFrame({'Text' : ["hello..", "again... this", "is......a test",  "Real ellipses… here", "...not here"]})
df['Ellipses'] = df.Text.str.contains(r'\w+(\.{3,})|…')

print(df)

给你:

                  Text  Ellipses
0              hello..     False
1        again... this      True
2       is......a test      True
3  Real ellipses… here      True
4          ...not here     False

或者没有 Pandas :

import re

for test in ["hello..", "again... this", "is......a test",  "Real ellipses… here", "...not here"]:
    print(int(bool(re.search(r'\w+(\.{3,})|…', test))))

这与中间测试相匹配:

看看search-vs-match Python 文档中有很好的解释。

<小时/>

显示匹配的单词:

import re
    
for test in ["hello..", "again... this", "is......a test",  "...def"]:
    ellipses = re.search(r'(\w+)\.{3,}', test)
    
    if ellipses:
        print(ellipses.group(1))

给你:

again
is

关于python - 如何在Python文本字符串中找到省略号？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46529659/

python - 如何在Python文本字符串中找到省略号？

上一篇：python - 如何按行读取数据并返回数据帧

下一篇：python - 使用 python 从多个媒体检索 Instagram 评论