这是我正在抓取的网页的 HTML 数据,正如您所看到的,它有多个选项卡。 (https://paste.pythondiscord.com/resaxivedo.py)
This is my code:
with open("tabledata.html", "r") as f:
contents = f.read()
outfile = open("table_data.csv", "w", newline='')
writer = csv.writer(outfile)
tree = BeautifulSoup(contents, "lxml")
dates = tree.findAll(class_="date")
list_of_dates = [date.text for date in dates]
table_tag = tree.select("table")[0]
tab_data = [[item.text for item in row_data.select("th,td")]
for row_data in table_tag.select("tr")]
writer.writerow(list_of_dates[0])
for data in tab_data:
print(' '.join(data))
writer.writerow(data)
如您所见,我正在使用 [0] 选择表格和日期。 如何创建一个循环以便打印 HTML 页面中所有表格的数据?
最佳答案
类似这样的事情:
for table_tag in tree.select("table") :
tab_data = [[item.text for item in row_data.select("th,td")]
for row_data in table_tag.select("tr")]
writer.writerow(list_of_dates[0])
for data in tab_data:
print(' '.join(data))
writer.writerow(data)
关于python - 如何迭代 HTML 数据中所有表的代码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58778520/