Python re.search() 和 re.findall()

<分区>

我正在尝试从 Hackerrank 的问题中解决这个问题。这是一个机器学习问题。最初，我试图从语料库文件中读取所有单词以构建单字词频率。根据这个 ML 问题 word 被定义为

Word is a sequence of characters containing only letters from a to z (lowercase only) and can contain hyphens (-) and apostrophe ('). Word should begin and end with lowercase letters only.

我用python写了一个正则表达式是这样的:

pat = "[a-z]+( ['-]+[a-z]+ ){0,}"

我尝试同时使用 re.search() 和 re.findall() 。我在这两个方面都有问题。

re.findall() 问题:
```
string = "HELLO W-O-R-L-D"
```
re.findall() 的输出:
```
[('Hello', ''), ('W', '-D')]
```
我听不懂 W-O-R-L-D 这个词。在使用 re.search() 时，我能够正确地获取它
re.search() 问题:
```
string = "123hello456world789"
```
re.search() 的输出:
```
'hello'
```
在这种情况下，当使用 re.findall() 时，我可以获得 'hello' 和 'world' 。

最佳答案

作为 I posted on your previous question，您应该使用 re.findall() - 但无论如何，您的问题是您的正则表达式是错误的。请参阅以下示例:

>>> import re
>>> regex = re.compile(r'([a-z][a-z-\']+[a-z])')
>>> regex.findall("HELLO W-O-R-L-D") # this has uppercase
[]  # there are no results here, because the string is uppercase
>>> regex.findall("HELLO W-O-R-L-D".lower()) # lets lowercase
['hello', 'w-o-r-l-d'] # now we have results
>>> regex.findall("123hello456world789")
['hello', 'world']

如您所见，您提供的第一个示例失败的原因是大写，您可以简单地添加 re.IGNORECASE 标志，尽管您提到匹配应该是仅限小写。

关于Python re.search() 和 re.findall()，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20244228/

上一篇：python - django 模型 - 有条件地设置 blank=True

下一篇：python - 类型错误 : unsupported operand type(s) for +: 'float' and 'tuple'

python - Flask 应用程序未连接到端口

从日志文件打印正则表达式组的 Python 问题

python - ImportError : numpy. core.multiarray 导入失败

python - 用 Python 计算元组的出现次数

python - Django 从浏览器发布 URL

python - Beautiful Soup For 循环给了我单独的列表，但是需要一个数据框

Python 3 使用ElementTree解析xml文件

java - Split() 函数用于字符串末尾的两个连续正则表达式

Python 正则表达式匹配 abcd ="_blank"> 和 </a> 之间的字符串