Python 正则表达式 - 识别列表中的第一项和最后一项

我需要将一些文本文件转换成 HTML 代码。我坚持将列表转换为 HTML 无序列表。示例来源:

some text in the document
* item 1
* item 2
* item 3
some other text

输出应该是:

some text in the document
<ul>
    <li>item 1</li>
    <li>item 2</li>
    <li>item 3</li>
</ul>
some other text

目前，我有这个:

r = re.compile(r'\*(.*)\n')
r.sub('<li>\1</li>', the_text_document)

它创建一个没有 < ul > 的 HTML 列表标签。
如何识别第一个和最后一个项目并用 < ul > 包围它们标签？

最佳答案

或者使用 BeautifulSoup

http://www.crummy.com/software/BeautifulSoup/bs4/doc/

编辑

我显然必须给你一些关于如何阅读文档的提示。

打开链接
左边有一个大菜单(蓝绿色)
如果仔细查看，您会发现文档分为多个部分
- 东西
- 在树中导航
- 搜索树
- 修改树(知道了)
- 输出(明白了!)

还有很多东西

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

读第一句话后不要停止阅读……最后一句非常重要，中间的内容也很重要。

换句话说，您可以创建一个空文档...假设:

soup = BeautifulSoup("<div></div>")
document = soup.div

然后你阅读你文本的每一行.. 然后只要你有文本就这样做。

document.append(line)

如果该行以`*``开头

ul = document.new_tag('ul')
document.append(ul)
document = ul

然后将所有 li 插入文档...一旦您结束阅读 *，只需弹出父级，以便文档返回到 div。并继续这样做……您甚至可以递归地将 ul 插入 ul 中。

一旦你解析了一切......你可以做

str(document)

或

document.prettify()

编辑

刚刚意识到您不是在编辑 html，而是在编辑未格式化的文本。您可以尝试使用 markdown。

http://daringfireball.net/projects/markdown/

关于Python 正则表达式 - 识别列表中的第一项和最后一项，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/11383949/

Python 正则表达式 - 识别列表中的第一项和最后一项

上一篇：python - OpenCV DestroyWindow 在 Ubuntu 上不工作。如何正确关闭窗口？

下一篇：python - Django 的 OpenID 服务器/提供商