我正在尝试从 .md
文件中检索前文,当我的前文的每个标题都在一行中时,我可以检索内容。
例如:
---
title: "Meeting"
date: 2019-03-14T07:51:28+01:00
draft: false
status:["process", "todo"]
---
所以我写了下面的python脚本来获取前面的内容
def get_front_matter(file, start='---', end='---'):
"""Strip file and retrieve front matter then format the value"""
content = {}
with open(file, 'r', encoding='UTF-8') as file_content:
for content_line in file_content:
if content_line.strip() == start:
break
for content_line in file_content:
if content_line.strip() == end:
break
line_data = content_line.split(':', 1)
# If we cannot split decently, carry on
if len(line_data) != 2:
continue
# format the string to store in dict for better usage
content[line_data[0]] = re.sub(r"[\n\t]*", "", line_data[1]).strip(' "')
return content
但是如果我的 front mater status
有多行,我就会遇到问题。
---
title: "Meeting"
date: 2019-03-14T07:51:28+01:00
draft: false
status:
[
"process",
"todo",
"hold"
]
---
当我尝试阅读上述文件前言时,我得到 status
的空白值,但它应该如下所示:
{'title': 'Meeting', 'date': '2019-03-14T07:51:28+01:00', 'draft': 'false', 'teams': '["process", "todo", "hold"]'}
有没有其他方法可以根据行或标签来阅读前文的内容。我尝试了一些正则表达式,但无法检索一组行。
最佳答案
我几乎保留了你的代码,关键是在我们之前不要将值添加到结果中
确保我们收集了完整的值
(当它被分成多行时),这是通过验证下一个str
行(如果它是有效值)来完成的(key: some value)
然后将前一个 key
及其 内容
添加到结果中,或者如果它是结束字符 ---
,我希望评论能让事情变得更清楚
def get_front_matter(file, start='---', end='---'):
"""Strip file and retrieve front matter then format the value"""
result = {}
with open(file, 'r', encoding='UTF-8') as file_content:
for content_line in file_content:
if content_line.strip() == start:
break
content = ''
key = ''
for content_line in file_content:
if content_line.strip() == end:
if key and content:
# add last key: content before breaking out
result[key] = re.sub(r"[\n\t]*", "", content).strip (' "')
break
line_data = content_line.split(':', 1)
if len(line_data) == 2 and not content:
# this is our first key: content, in this point we don't have previous content so we should keep them and check the next value first
key = line_data[0]
content = line_data[1]
continue
elif len(line_data) == 2: # we found another valid value
# add previous (key, content) and keep the new (key , content)
result[key] = re.sub(r"[\n\t]*", "", content).strip(' "')
key = line_data[0]
content = line_data[1]
else:
# not a valid key: value add it to previous value because it's a value splited in multiple line
content += content_line
return result
注意:我用结果更改了内容名称,并且此代码将在以下情况下中断:
title: "Meeting"
date: 2019-03-14T07:51:28+01:00
draft: false
status:
[
"somevalue:process", # if the value contains ':'
"todo",
"hold"
]
这里您没有指定我们如何区分键和包含“:”的值(如果它前面没有键)。我希望这不会使 对你来说有个问题
关于python - 关于文件字体问题的正则表达式 re.sub,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59927959/