我有一些行为良好的 xml 文件,我想使用正则表达式重新格式化(不是解析!)。目标是拥有每个 <trkpt>
配对作为单线。
以下代码有效,但我希望在单个正则表达式替换而不是循环中执行操作,这样我就不需要将字符串连接回去。
import re
xml = """
<trkseg>
<trkpt lon="-51.2220657617" lat="-30.1072524581">
<time>2012-08-25T10:20:44Z</time>
<ele>0</ele>
</trkpt>
<trkpt lon="-51.2220657617" lat="-30.1072524581">
<time>2012-08-25T10:20:44Z</time>
<ele>0</ele>
</trkpt>
<trkpt lon="-51.2220657617" lat="-30.1072524581">
<time>2012-08-25T10:20:44Z</time>
<ele>0</ele>
</trkpt>
</trkseg>
"""
for trkpt in re.findall('<trkpt.*?</trkpt>', xml, re.DOTALL):
print re.sub('>\s*<', '><', trkpt, re.DOTALL)
使用 sed
的答案也将受到欢迎。
感谢您的阅读
最佳答案
这个怎么样:
>>> regex = re.compile(
r"""\n[ \t]* # Match a newline plus following whitespace
(?= # only if...
(?: # ...the following can be matched:
(?!<trkpt) # (unless an opening <trkpt> tag occurs first)
. # any character
)* # any number of times,
</trkpt> # followed by a closing </trkpt> tag
) # End of lookahead""",
re.DOTALL | re.VERBOSE)
>>> print regex.sub("", xml)
<trkseg>
<trkpt lon="-51.2220657617" lat="-30.1072524581"><time>2012-08-25T10:20:44Z</time><ele>0</ele></trkpt>
<trkpt lon="-51.2220657617" lat="-30.1072524581"><time>2012-08-25T10:20:44Z</time><ele>0</ele></trkpt>
<trkpt lon="-51.2220657617" lat="-30.1072524581"><time>2012-08-25T10:20:44Z</time><ele>0</ele></trkpt>
</trkseg>
关于Python正则表达式删除模式匹配中的空格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12205455/