python - 关于文件字体问题的正则表达式 re.sub

我正在尝试从 .md 文件中检索前文，当我的前文的每个标题都在一行中时，我可以检索内容。

例如:

---
title: "Meeting"
date: 2019-03-14T07:51:28+01:00
draft: false
status:["process", "todo"]
---

所以我写了下面的python脚本来获取前面的内容

def get_front_matter(file, start='---', end='---'):
    """Strip file and retrieve front matter then format the value"""
    content = {}
    with open(file, 'r', encoding='UTF-8') as file_content:
        for content_line in file_content:
            if content_line.strip() == start:
                break
        for content_line in file_content:
            if content_line.strip() == end:
                break

            line_data = content_line.split(':', 1)
            # If we cannot split decently, carry on
            if len(line_data) != 2:
                continue
            # format the string to store in dict for better usage
            content[line_data[0]] = re.sub(r"[\n\t]*", "", line_data[1]).strip(' "')
    return content

但是如果我的 front mater status 有多行，我就会遇到问题。

---
title: "Meeting"
date: 2019-03-14T07:51:28+01:00
draft: false
status:
  [
    "process",
    "todo",
    "hold"
  ]
---

当我尝试阅读上述文件前言时，我得到 status 的空白值，但它应该如下所示:

{'title': 'Meeting', 'date': '2019-03-14T07:51:28+01:00', 'draft': 'false', 'teams': '["process", "todo", "hold"]'}

有没有其他方法可以根据行或标签来阅读前文的内容。我尝试了一些正则表达式，但无法检索一组行。

最佳答案

我几乎保留了你的代码，关键是在我们之前不要将值添加到结果中确保我们收集了完整的值(当它被分成多行时)，这是通过验证下一个str行(如果它是有效值)来完成的(key: some value) 然后将前一个 key 及其 内容 添加到结果中，或者如果它是结束字符 ---，我希望评论能让事情变得更清楚

    def get_front_matter(file, start='---', end='---'):
        """Strip file and retrieve front matter then format the value"""
        result = {}
        with open(file, 'r', encoding='UTF-8') as file_content:
            for content_line in file_content:
                if content_line.strip() == start:
                    break

            content = ''
            key = ''
            for content_line in file_content:
                if content_line.strip() == end:
                    if key and content:
                        # add last key: content before breaking out
                        result[key] = re.sub(r"[\n\t]*", "", content).strip (' "')
                    break

                line_data = content_line.split(':', 1)
                if len(line_data) == 2 and not content:
                    # this is our first key: content, in this point we don't have previous content so we should keep them and check the next value first
                    key = line_data[0]
                    content = line_data[1]
                    continue
                elif len(line_data) == 2:  # we found another valid value 
                    # add previous (key, content) and keep the new (key , content)
                    result[key] = re.sub(r"[\n\t]*", "", content).strip(' "')
                    key = line_data[0]
                    content = line_data[1]
                else:
                    # not a valid key: value add it to previous value because it's a value splited in multiple line
                    content += content_line

        return result

注意:我用结果更改了内容名称，并且此代码将在以下情况下中断:

     title: "Meeting"
    date: 2019-03-14T07:51:28+01:00
    draft: false
    status:
      [
        "somevalue:process",  # if the value contains ':'
        "todo",
        "hold"
      ]

这里您没有指定我们如何区分键和包含“:”的值(如果它前面没有键)。我希望这不会使对你来说有个问题

关于python - 关于文件字体问题的正则表达式 re.sub，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59927959/

python - 关于文件字体问题的正则表达式 re.sub

上一篇：c# - 如何在 Automapper 中从源代码制作部分 map

下一篇：android - Environment.getExternalStorageDirectory() 已弃用。如何新建文件夹？