python - 如何使用 API 从 Google 文档中提取标题

目前正在尝试创建一个 python 脚本来检查 google 文档的各种 SEO 页面指标。

google 文档 API 有一个 good sample显示如何从谷歌文档中提取所有文本。但是，这只会返回没有格式的纯文本。

要执行我的检查，我需要能够拆分出 H1、H2-H4、粗体文本等，但是在玩弄/搜索 API 文档/网络两个小时后，我不知道如何编辑以下循环以获取(例如)所有 HEADING_2 元素。

    text = ''
    for value in elements:
        if 'paragraph' in value:
            elements = value.get('paragraph').get('elements')
            for elem in elements:
                text += read_paragraph_element(elem)
        elif 'table' in value:
            # The text in table cells are in nested Structural Elements and tables may be
            # nested.
            table = value.get('table')
            for row in table.get('tableRows'):
                cells = row.get('tableCells')
                for cell in cells:
                    text += read_strucutural_elements(cell.get('content'))
        elif 'tableOfContents' in value:
            # The text in the TOC is also in a Structural Element.
            toc = value.get('tableOfContents')
            text += read_strucutural_elements(toc.get('content'))
    return text

感谢任何帮助。谢谢。

最佳答案

我相信你的目标和你的现状如下。

您想检索段落样式的 HEADING_2 文本。
您想使用适用于 Python 的 googleapis 实现这一目标。
您想使用问题中的脚本实现您的目标。
您已经使用 Docs API 从 Google 文档中获取值。

修改点:

在这种情况下，我认为当namedStyleType的值为HEADING_2时，需要检索文本。

当这一点反射(reflect)到你的脚本中，就会变成如下。

修改后的脚本:

从:

for value in elements:
    if 'paragraph' in value:
        elements = value.get('paragraph').get('elements')

到:

for value in elements:
    if 'paragraph' in value and value['paragraph']['paragraphStyle']['namedStyleType'] == 'HEADING_2':  # Modified
        elements = value.get('paragraph').get('elements')

引用:

NamedStyleType

关于python - 如何使用 API 从 Google 文档中提取标题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66413977/

python - 如何使用 API 从 Google 文档中提取标题

修改点:

修改后的脚本:

引用:

上一篇：reactjs - 在 React 中安装 particle.js 时出现 NPM ERR

下一篇：tensorflow - 你能在 TensorFlow 中组合两个神经网络吗？