python - 在python中的某个模式之前添加字符串中的分隔符

我有一个特定单词的列表 [“面积”、“建筑物”、“街道号”、“楼层”]

如果在字符串中以下任何一个单词都以冒号(:) 结尾，我需要在该单词之前添加一个分隔符(最好是逗号)。例如:

sample_input = "区域:al mansorah 街号:30 楼:xyz 塔楼:3 层"

expected_output =“区域:al mansorah，街道号:30，建筑物:xyz 塔，楼层:3 楼”

这是我当前的实现:

        sentence= "area : al mansorah street no    : 30 building : 6 floor : 3rd"
        words = ["area", "building", "street no", "floor"]
        for x in words:
            regex = re.escape(x) + r"\s+:"
            rep_str = ", " + x + ":"
            sentence = re.sub(regex, rep_str, sentence)

这是可行的，但效率很低，因为我有数百个这样的单词需要检查。它也不涵盖边缘情况，例如，如果它是第一个单词，则不要添加分隔符；如果它已经存在，则不要添加分隔符。任何帮助将不胜感激。

最佳答案

您可能正在寻找的正则表达式是 ([^,\s])(\s+(?:your|words|here)\s*:) 因为它非常适合 python并且可以动态增长。您可以使用 for 循环构建一个数百个单词长的正则表达式，然后运行一次，而不是使用 for 循环运行此正则表达式数百次。

([^\s,]) 捕获非逗号、非空白字符 - 如果已经有逗号，或者这是该行中的第一个单词，它将被忽略.
(\s+(?:your|words|here)\s*:) 捕获一个或多个空白字符，后跟列表中的任何单词，并以冒号结尾。

Regex demo!

#the first part of the string
rex_str = "([^,\s])(\s+(?:"
#the first word
rex_str += words[0]

#get the rest of the words into the non capture group
for i in range(1, len(words)):
  rex_str += "|"
  rex_str += words[i]

#close the regex
rex_str += ")\s*:)"

#add a comma between the first and second capture groups
sentence = re.sub(rex_str, "\g<1>,\g<2>", sentence)

Python demo!

关于python - 在python中的某个模式之前添加字符串中的分隔符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58303700/

python - 在python中的某个模式之前添加字符串中的分隔符

上一篇：python - 使用 Regex 观看第一场比赛 - PYTHON

下一篇：python - 在Scipy中曲线拟合真实数据时如何修复 "RuntimeWarning: overflow encountered in exp"？