我有一个包含如下行的文件:
saldkfjaslk
asdlkfja
alsdkfjlk
aslkda;kdfsdlkfaj
sladkfjalskdfjlaskd
sldkfaj
lsadkfj
qwewrewst
se0polkjlkj
lpoerlwoej
alskdjf
asldkfjljlkjlk
sadlkfa
我想将以字符(而非空格)开头的行与以空格开头的相应行组合在一起。我还想省略下一行不以空格开头的行。使用上述示例的所需输出如下所示:
[('saldkfjaslk', 'asdlkfja', 'alsdkfjlk'),
('sladkfjalskdfjlaskd', 'sldkfaj', 'lsadkfj'),
('lpoerlwoej', 'alskdjf', 'asldkfjljlkjlk')]
我如何用 Python 解析这个文件?
最佳答案
>>> regex = re.compile(r"^\S.*(?:\n\s.*)+", re.MULTILINE)
>>> [tuple(match.split()) for match in regex.findall(s)]
[('saldkfjaslk', 'asdlkfja', 'alsdkfjlk'),
('sladkfjalskdfjlaskd', 'sldkfaj', 'lsadkfj'),
('lpoerlwoej', 'alskdjf', 'asldkfjljlkjlk')]
解释:
^ # Start of line
\S # Match a non-whitespace character
.* # Match the rest of the line
(?: # Match...
\n # a newline character
\s # a whitespace character
.* # and the rest of the line
)+ # once or more
关于python - 按线型对线进行分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14689974/