python - 如何获取所有不包含数字的特定长度的单词？

标签 python regex

我有一个输入(包括unicode):

s = "问题1:a12是a的个数，b1是cầu thủ的个数"

我想获取所有不包含数字且超过 2 个字符的单词，期望输出:

['is', 'the', 'number', 'of', 'is', 'the', 'number', 'of', 'cầu', 'thủ'].

我试过了

re.compile('[\w]{2,}').findall(s)

得到了

'Question1', 'a12', 'is', 'the', 'number', 'of', 'b1', 'is', 'the', 'number', 'of', 'cầu', 'thủ'

有什么办法可以只得到没有数字的单词吗？

最佳答案

你可以使用

import re
s = "Question1: a12 is the number of a, b1 is the number of cầu thủ"
print(re.compile(r'\b[^\W\d_]{2,}\b').findall(s))
# => ['is', 'the', 'number', 'of', 'is', 'the', 'number', 'of', 'cầu', 'thủ']

或者，如果您只想限制为最少 2 个字母的 ASCII 字母词:

print(re.compile(r'\b[a-zA-Z]{2,}\b').findall(s))

参见 Python demo

详情

要仅匹配字母，您需要使用 [^\W\d_](或 r'[a-zA-Z] 仅 ASCII 变体)
要匹配整个单词，你需要单词边界，\b
为确保您定义的是单词边界而不是正则表达式模式中的退格字符，请使用原始字符串文字 r'...'。

因此，r'\b[^\W\d_]{2,}\b' 定义了一个匹配单词边界、两个或更多字母的正则表达式，然后断言没有这两个字母之后的单词 char。

关于python - 如何获取所有不包含数字的特定长度的单词？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56108377/

上一篇：python - 用于访问 Pandas 中的列的括号表示法和点表示法之间的速度差异

下一篇：python - 制作所有唯一单词的数据框及其计数和

regex - 返回两个字符之间的文本

python - 为 6 个字符的代码编写正则表达式

python - 为什么 chr(0x24) + chr(0x84) 的结果在 python 2 和 3 中不同

python - 多个 conemu 窗口 - 如何区分？

regex - 使用正则表达式和 vb.net 从字符串中提取数字

javascript - 用等效的 HTML 替换字符串。除了 <a> 标签

python - 从 BigQuery 获取数据需要很长时间

python - 使用python requests和beautiful soup拉取文本

regex - XML Schema 正则表达式空字符串