python - BeautifulSoup Python将输出链接保存到txt文件

标签 python web-scraping beautifulsoup

我正在尝试使用 BeautifulSoup 收集网页上的链接。到目前为止，我已经能够做到这一点，并使用当前在代码中注释掉的打印命令在命令提示符中打印它们。我遇到的问题是，当链接保存到 Output.txt 文件时，它们都会相互覆盖，并且只保存最后一个链接。非常感谢任何帮助!

如果您对在一个程序中完成这一转变有任何建议，请参阅我的最终目标。 我的最终目标是搜索 txt 文件中的链接以确定其中是否包含特定文本。如果他们这样做，我想返回“损坏的链接”或“未损坏”。

soup = BeautifulSoup(html_doc) #html doc is source code for website i am using

for link in soup.find_all(rel="bookmark"):
  Gamma =(link.get('href'))
  f =open('Output.txt','w')
  f.write(Gamma)
  f.close()
  #print(Gamma)

最佳答案

您需要在循环之前打开文件进行写入，并在内部调用write():

soup = BeautifulSoup(html_doc)

with open('Output.txt','w') as f:
    for link in soup.find_all(rel="bookmark"):
        f.write(link.get('href'))

另外，请注意使用 with context manager这里有助于不用担心手动关闭文件。

关于python - BeautifulSoup Python将输出链接保存到txt文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25857605/

上一篇：python - 如何在 Kivy 中压缩 GridLayout 垂直空间？

下一篇：python - Flask 登录无需持久化

相关文章：

python - 没有名为 BeautifulSoup 的模块(但应该安装它)

python - 如何从 Python 通过 SSH 在控制台菜单 (pdmenu) 上导航？

python - Python 中的持久 MySQL 连接

python - Selenium 类型错误 : __init__() takes 2 positional arguments but 3 were given

javascript - Puppeteer Bright Data 代理返回 ERR_NO_SUPPORTED_PROXY 或 CERT 错误

python - 使用 BeautifulSoup Python 拉取相邻的表格单元格

python - matplotlib:set_major_formatter() 和 fmt_xdata 有什么区别？

python - 为 soup.select() 正确的 div 类组合

python - 如何在Python上解析网站上的特定HTML表

python - 网页抓取 : getting KeyError when parsing JSON in Python