python - BeautifulSoup 转换成 python 换行符

我有一个 html 表格，每个单元格包含多行文本和数据，我试图最终提取它们，并且它们使用中断来提高从中抓取的网站的可读性。以下是此类单元格的一个示例:

<td class="cell">-<br>21%<br>1<br>
<font color="red">5001</font><br>12%
                </td>

如何将这些中断转换为与 Pandas 兼容的换行符(即会有一个由\n 分隔的 4 行字符串)？

使用此代码片段:

for cell in soup.find_all('td'):
    cell.replace_with(cell.get_text('\n',strip=True))

表中每个条目的结果均为 NaN 值。

最佳答案

您可以将 'br' 替换为 '\n':

for linebreak in soup.find_all('br'):
    linebreak.replace_with('\n')

希望有帮助。

关于python - BeautifulSoup 转换成 python 换行符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57750303/

相关文章：

python - 创建原型(prototype)向量进行比较