python - 使用 python 正则表达式匹配但不包括其中之一

所以我有以下格式的数据:

<Category: XXX &nbsp;-&nbsp;

或

<Category: XXX</b>

<小时/>

我只想保留“xxx”，但只能保留 ('XXX','') 或 ('') 或其他不需要的变体。

我不想使用 beautiful soup，我在使用 anaconda 包管理器下载它时遇到问题

<小时/>

添加 - 我的尝试

'Category: ([^<]+)</b'

当

时会产生['xxx']

<Category: XXX</b>

<小时/>

'Category: ([^<]+) &n'

当

时会产生['xxx']

<Category: XXX &nbsp;

我想我做了类似的事情

'Category: ([^<]+)(</b| &n)'

结果

[('XXX', '</b')]

或

[('XXX', ' &nb')]

最佳答案

>>> import re
>>> re.match('<Category:\s(\w+)', "<Category: XXX</b>").group(1)
'XXX'
>>> re.match('<Category:\s(\w+)', "<Category: XXX &nbsp;-&nbsp;").group(1)
'XXX'

或使用findall:

>>> import re
>>> re.findall('<Category:\s(\w+)', "<Category: XXX &nbsp;-&nbsp;")[0]
'XXX'
>>> re.findall('<Category:\s(\w+)', "<Category: XXX</b>")[0]
'XXX'

\s 匹配任何空白字符。
\w 匹配任何非字母数字字符；这相当于集合[^a-zA-Z0-9_]。
\w+ 匹配一个或多个非字母数字字符。
(...) 是一个捕获组:

Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed

参见documentation了解更多信息。

关于python - 使用 python 正则表达式匹配但不包括其中之一，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18340765/

上一篇：python - 使用字典存储数据的更干净的方法

下一篇：python - MatPlotLib 和 PyQt 绘图以及打印机的附加数据

相关文章：

python - 从 pandas MultiIndex 数据框中选择特定列

python - Python 中使用 Condition 对象的线程间通信

python - 运行虚拟环境 Ansible

regex - PowerShell 3 正则表达式 - 为什么不区分大小写不起作用？

Python: "not"关键字放置

python - OpenCV:FFMPEG:标签 0x34363268/'h264' 不支持编解码器

PHP - preg_replace 括号中的项目与数组项目

python - 在多次调用的函数中编译正则表达式

PHP、正则表达式和大小写

javascript - 正则表达式替换字符串中不是数字或句点的任何内容