python - 使用单词列表作为引用的正则表达式提取，Python

我在 txt 文件中有一个介词列表。我正在创建一个函数，以便它将从字符串中提取介词后面的单词。由于介词较多，直接放入re.compile中不太可行。所以我使用的是txt文件。这是我的代码:

with open("Input.txt"):
words = "|".join(line.rstrip() for line in open)
pattern = re.compile('{}\s(\w+|\d+\w+)\s\w+'.format(words))

其中 {} 表示 preps 的匹配，而\s 是一个空格，后跟一个单词或数字和单词的组合，如 20th cross 等。我收到的错误是

TypeError                                 Traceback (most recent call last)
<ipython-input-43-0aed517ef1ba> in <module>()
  1 with open("Input.txt"):
----> 2     words = "|".join(line.rsplit() for line in open)
  3 pattern = re.compile("{}\s(\w+|\d+\w+)\s\w+".format(words))

TypeError: 'builtin_function_or_method' object is not iterable

Input.txt 文件的内容为 ['near','above','towards'...] 等等。我如何迭代它？

最佳答案

代码正在迭代open函数。您应该交互文件对象来获取行。

并且 rsplit 似乎是 rstrip 的拼写错误。

with open("Input.txt") as f:
    words = "|".join(line.rstrip() for line in f)
    pattern = re.compile(r'(?:{})\s(\w+|\d+\w+)\s\w+'.format(words))

如果单词中包含一些在正则表达式中具有特殊含义的字符，则应使用re.escape对其进行转义.

with open("Input.txt") as f:
    words = "|".join(re.escape(line.rstrip()) for line in f)
    pattern = re.compile(r'(?:{})\s(\w+|\d+\w+)\s\w+'.format(words))

关于python - 使用单词列表作为引用的正则表达式提取，Python，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22059434/

python - 使用单词列表作为引用的正则表达式提取，Python

上一篇：python - np.mean() 导致内存不足错误

下一篇：Python Pandas : combine 2 dataframes, 一帧的列作为最终结果的索引