python - 使用 BeautifulSoup 从表中提取某些列

标签 python html xml web-scraping beautifulsoup

您好,我正在尝试使用 html 表确定在 eBay 上购买商品的日期:https://offer.ebay.com/ws/eBayISAPI.dll?ViewBidsLogin&item=173653442617&rt=nc&_trksid=p2047675.l2564

我的Python代码:

def soup_creator(url):
  # Downloads the eBay page for processing
  res = requests.get(url)
  # Raises an exception error if there's an error downloading the website
  res.raise_for_status()
  # Creates a BeautifulSoup object for HTML parsing
  return BeautifulSoup(res.text, 'lxml')

soup = soup_creator(item_link)      
purchases = soup.find('div', attrs={'class' : 'BHbidSecBorderGrey'})
purchases = purchases.findAll('tr', attrs={'bgcolor' : '#ffffff'})
for purchase in purchases:
    date = purchase.findAll("td", {"align": "left"})
    date = date[2].get_text()
    print(purchase)

当我运行 print 语句时,它不会返回任何内容,我认为这意味着它没有找到任何内容。我希望它打印出这样的内容:

Jul-02-19 18:22:28 PDT
Jun-27-19 16:12:59 PDT
Jun-23-19 06:46:23 PDT
...

最佳答案

Pandas :

对于 pandas 来说非常简单,只需为右表建立索引并切出列

import pandas as pd

table = pd.read_html('https://offer.ebay.com/ws/eBayISAPI.dll?ViewBidsLogin&item=173653442617&rt=nc&_trksid=p2047675.l2564')[4]
table['Date of Purchase']

bs4 方法 1:

正如您所知的列号,您可以在感兴趣的表上使用 nth-of-type

from bs4 import BeautifulSoup as bs
import requests

r = requests.get('https://offer.ebay.com/ws/eBayISAPI.dll?ViewBidsLogin&item=173653442617&rt=nc&_trksid=p2047675.l2564')
soup = bs(r.content, 'lxml')
#if column # is known 
purchases = [item.text for item in soup.select('table[width] td:nth-of-type(5)')]

bs4 方法 2(不太理想且列号未知)

from bs4 import BeautifulSoup as bs
import requests

r = requests.get('https://offer.ebay.com/ws/eBayISAPI.dll?ViewBidsLogin&item=173653442617&rt=nc&_trksid=p2047675.l2564')
soup = bs(r.content, 'lxml')
#if column # not known
headers = [item.text.strip() for item in soup.select('table[width] th')]
desired_header = 'Date of Purchase'

if desired_header in headers: 
    print([item.text for item in soup.select('table[width] td:nth-of-type(' + str(headers.index(desired_header) + 1) + ')')])

关于python - 使用 BeautifulSoup 从表中提取某些列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56895156/

相关文章:

python - 测试一个值是否在两个列表之一中

python - 如何在 python 3.x 中使用 string.replace()

python - pymysql fetchall() 结果作为字典?

java - Android 4 设置样式

html - 从 xml 创建可打印 pdf 的一般过程

python - 模数如何处理也小于除数的负股息?

javascript - 菜单打开时如何展开DIV

html - img 标题属性显示 block 而不是外来字符

mysql - 使用 AJAX 更新数据库

xml - 将节点添加到 XML 而不复制 xmlns =""属性