python - BeautifulSoup 4 Python 网页抓取到 txt 文件

我正在尝试将数据写入使用 BeautifulSoup 抓取的文本文件，它会在控制台中打印数据，但不会打印到文件中

import requests
from bs4 import BeautifulSoup
 
base_url = 'https://www.aaabbbccc.com'
r = requests.get(base_url)
soup = BeautifulSoup(r.text)

outF = open("myOutFile.txt", "w", encoding='utf-8')
for story_heading in soup.find_all(class_="col-md-4"): 
    
    if story_heading.a: 
        print(story_heading.a.text.replace("\n", " ").strip())
        outF.write(str(story_heading))
        outF.write("\n")
 
    else: 
        print(story_heading.contents[0].strip())

outF.close()

最佳答案

我总是使用a+方法!

如果您的硬盘上不存在该文本文件，它将创建它并写入该文件。如果文本文件存在，它会将您的内容附加到其末尾!

with open("myOutFile.txt", "a+") as f:

import requests
from bs4 import BeautifulSoup
 
base_url = 'https://www.aaabbbccc.com'
r = requests.get(base_url)
soup = BeautifulSoup(r.text)

with open("myOutFile.txt", "a+", encoding='utf-8') as f:
    for story_heading in soup.find_all(class_="col-md-4"): 

        if story_heading.a: 
            print(story_heading.a.get_text())
            f.write(str(story_heading)+"\n")
        else: 
            print(story_heading.contents[0].strip())

关于python - BeautifulSoup 4 Python 网页抓取到 txt 文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65815005/

上一篇：javascript - 获取API : is it possible to request partial html page (the beginning)

下一篇：javascript - Laravel-mix@6.0.10 遇到未知错误 Uncaught DOMException : Failed to construct 'WebSocket'

Python - Beautiful Soup - 如何过滤提取的数据中的关键字？

python - Zip 函数未显示已抓取数据的完整列表

python - 使用 BeautifulSoup 抓取网站时读取页码

python - 将 Python 列表转换为多列 Pandas Dataframe

python - 重用pytorch模型时重复层

python - 如何尽可能多地输入 x 和 y？

python - 为什么 dropwhile 和 takewhile 会跳过最后一个 a？

python - 我需要在这个 Django 模型中添加一个 db_index 吗？

python - 网络抓取最常见的名字