python - 在Python中为复杂字符串设置正则表达式

我有这样的产品的一系列成分:

text = 'Pork and beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (E450), glucose, antioxidant (E316), a preservative (E250), flavorings'

我想检测其中的所有文本(成分)，使其看起来像这样。

ingredientsList= ['Pork and beef', 'salt', 'spices', 'white pepper', 'nutmeg', 'coriander', 'cardamom', 'stabilizer', 'glucose', 'antioxidant', 'preservative', 'flavorings']

我在这里使用的当前正则表达式如下:

ingredients = re.findall(r'\([^()]*\)|([^\W\d]+(?:\s+[^\W\d]+)*)', text)

但它没有提供括号中的文本。我只是不想包含代码和百分比，但希望将所有成分都放在括号内。我应该在这里做什么？提前致谢。

最佳答案

您可以限制第一个分支仅匹配以 E 开头且后跟数字的代码:

\(E\d+\)|([^\W\d]+(?:\s+[^\W\d]+)*)

请参阅regex demo

现在，\(E\d+\) 将仅匹配类似 (Exxx) 的子字符串，其他子字符串将被处理。您也可以在此处添加百分比，以明确跳过它们 - \((?:E\d+|\d+(?:[.,]\d+)?%)\) .

Python demo :

import re
rx = r"\(E\d+\)|([^\W\d]+(?:\s+[^\W\d]+)*)"
s = "Pork and beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (E450), glucose, antioxidant (E316), a preservative (E250), flavorings"
res = [x for x in re.findall(rx, s) if x]
print(res)

关于python - 在Python中为复杂字符串设置正则表达式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40259185/

上一篇：python - 如何从 itertuples 中删除 "Pandas"对象名称？

下一篇：python - Pandas 值错误: too many values to unpack np. polyfit

javascript - 如何拆分字符串包含javascript中的分隔符？

python - Keras 非顺序，尺寸和 reshape 方面存在问题

具有多处理功能的 Python 全局数组

python - Python Twisted 的数据库

python - 将 Youtube 上传内容转换为 Podcast

regex - 如何解决“替换字符串”方法区分大小写的问题？

javascript - 匹配字符串结尾的正则表达式模式

Python 写入 linux/proc/mystats 文件

java - 在除单个空格外的空格上拆分字符串