Python正则表达式删除模式匹配中的空格

我有一些行为良好的 xml 文件，我想使用正则表达式重新格式化(不是解析!)。目标是拥有每个 <trkpt>配对作为单线。

以下代码有效，但我希望在单个正则表达式替换而不是循环中执行操作，这样我就不需要将字符串连接回去。

import re

xml = """
    <trkseg>
      <trkpt lon="-51.2220657617" lat="-30.1072524581">
        <time>2012-08-25T10:20:44Z</time>
        <ele>0</ele>
      </trkpt>
      <trkpt lon="-51.2220657617" lat="-30.1072524581">
        <time>2012-08-25T10:20:44Z</time>
        <ele>0</ele>
      </trkpt>
      <trkpt lon="-51.2220657617" lat="-30.1072524581">
        <time>2012-08-25T10:20:44Z</time>
        <ele>0</ele>
      </trkpt>
    </trkseg>
"""

for trkpt in re.findall('<trkpt.*?</trkpt>', xml, re.DOTALL):
    print re.sub('>\s*<', '><', trkpt, re.DOTALL)

使用 sed 的答案也将受到欢迎。

感谢您的阅读

最佳答案

这个怎么样:

>>> regex = re.compile(
    r"""\n[ \t]*  # Match a newline plus following whitespace
    (?=           # only if... 
     (?:          # ...the following can be matched:
      (?!<trkpt)  #  (unless an opening <trkpt> tag occurs first)
      .           #  any character
     )*           # any number of times,
     </trkpt>     # followed by a closing </trkpt> tag
    )             # End of lookahead""", 
    re.DOTALL | re.VERBOSE)
>>> print regex.sub("", xml)

    <trkseg>
      <trkpt lon="-51.2220657617" lat="-30.1072524581"><time>2012-08-25T10:20:44Z</time><ele>0</ele></trkpt>
      <trkpt lon="-51.2220657617" lat="-30.1072524581"><time>2012-08-25T10:20:44Z</time><ele>0</ele></trkpt>
      <trkpt lon="-51.2220657617" lat="-30.1072524581"><time>2012-08-25T10:20:44Z</time><ele>0</ele></trkpt>
    </trkseg>

关于Python正则表达式删除模式匹配中的空格，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12205455/

上一篇：Python 逐元素就地添加

下一篇：python - 在 django 查询中连接过滤器

python - 在 init 方法中将变量分配为 None 时抑制 PyCharm 警告

r - 在 R 中查找和替换两个字符串之间的文本

正则表达式仅用于替换括号外的特定字符

python - 多个函数中的未绑定(bind)本地错误

python - 如何让 pandas read_csv 从它自己的 csv 生成文件中正确解析日期？

R - 替换正则表达式中的第 1 组匹配但不是完全匹配

regex - Notepad++正则表达式查找3个连续数字

javascript - .replace 不删除字符

regex - R-从电话号码栏中删除破折号