html - 是否可以通过请求和 BeautifulSoup 获取 eBay 商品描述?

标签 html events web-scraping beautifulsoup python-requests

我正在尝试从 eBay 上的鞋子收集数据。对于每个项目,我都想收集所有数据,包括自定义描述以建立数据库。我已经通过请求和 BS4 收集了所有方面,例如价格、运输标题等。不幸的是,唯一缺少的是自定义项目描述。

这似乎是一个事件 html,在浏览器中自动加载,但不是请求和 BS4。我更愿意用请求和 BS4 来做,因为脚本几乎准备好了,并且通过例如 Selenium 进行抓取要慢得多。我正在处理的示例如下:

from bs4 import BeautifulSoup as soup
import requests

source=requests.get("https://www.ebay.com/itm/SIGNED-2000-NIKE-AIR-JORDAN-1-HIGH-BANNED-1985-ROOKIE-RETRO-SHOES-AUTOGRAPH-UDA/392861887827?hash=item5b7864ad53:g:njcAAOSw9rpfALFX")
Nike_shoe = soup(source.text, "lxml")

我试图过滤的描述部分包含以下摘录,以及其他内容: Excerpt of target HTML 这可以在易趣页面上找到一点。此描述是以下 HTML 结构的一部分:

Event_example

当我浏览 Nike_shoe 汤时,此文本不存在。我尝试将 source.text 解析为 lxml、html.parser、html5lib 和 xml。

我也尝试过使用 Requests-HTML应具有完整 JavaScript 支持的包:

from requests_html import HTMLSession
session = HTMLSession()
source = session.get('https://www.ebay.com/itm/SIGNED-2000-NIKE-AIR-JORDAN-1-HIGH-BANNED-1985-ROOKIE-RETRO-SHOES-AUTOGRAPH-UDA/392861887827?hash=item5b7864ad53:g:njcAAOSw9rpfALFX')
Nike_shoe=soup(source.text, "html5lib")

但不幸的是,我仍然无法检索到这些数据。另外我不熟悉这个包,所以也许我做错了什么。

编辑 22/08/2020 13:41: 下面的两个答案 (@Andrej Kesely & @p1xel) 都给出了正确的结果。 p1xel 他的答案可以实现如下:

source=requests.get("https://www.ebay.com/itm/SIGNED-2000-NIKE-AIR-JORDAN-1-HIGH-BANNED-1985-ROOKIE-RETRO-SHOES-AUTOGRAPH-UDA/392861887827?hash=item5b7864ad53:g:njcAAOSw9rpfALFX")
Nike_shoe = soup(source.text, "lxml")
iframe=requests.get(Nike_shoe.find(id="desc_ifr")["src"])
Custom_description = soup(iframe.text, "html5lib")
print(Custom_description.find("td").text
                            SIGNED 2000NIKE AIR JORDAN 1 HIGH BANNED 1985 ROOKIE RETRO SHOES AUTOGRAPH UDA SIGNED IN PRESENCE OF UPPER DECK REPRESENTATIVES•  SHOES ARE OFFICIAL RETRO FROM 2000, BRAND NEW WITH ORIGINAL BOX AND RETRO CARD Beautiful signature accompanied by  CERTIFICATE OF AUTHENTICITY FROM THE UPPER DECK COMPANY, which currently HOLDS AN EXCLUSIVE RIGHTS to ALL authorized authentic Jordan autographed memorabilia & trading cards (No 3rd party authentication here!)Have a peace of mind knowing that YOU ARE GETTING THE REAL DEAL
RECENTLY ACQUIRED BIG COLLECTION FROM A PRIVATE COLLECTOR, PLEASE CHECK OUR AUCTION PERIODICALLY AS WE WILL CONTINUE TO POST NEW ITEMS DAILYIt says on the certificate that "Each individual product that bears the original autograph is signed in the presence of an Upper Deck Authenticated representative and registered by its numbered hologram and kept on permanent file", as part of UDA's patented 5-Step Hologram process. (NO LETTER OF OPINION HERE!)
Pictures are from the actual shoe you are bidding on....  BUY FROM A REPUTABLE COLLECTOR, Please check my feedbacks from previous satisfied buyers and bid with confidence.BUYER TO PAY $100 FOR FULLY INSURED shipping with tracking number & signature confirmation. International buyer are responsible for any import/customs duty fee that might be charged upon delivery of the packageRECENTLY ACQUIRED BIG COLLECTION FROM A PRIVATE COLLECTOR, PLEASE CHECK OUR AUCTION PERIODICALLY AS WE WILL CONTINUE TO POST NEW ITEMS DAILYALL SALES ARE FINAL. MAKE SURE TO CHECK MY OTHER AUCTIONS FOR MORE GREAT MJ MEMORABILIA

由于p1xel他的回答是通过同一页面上的requests格式完成的,所以这将被选为可接受的解决方案,但两种解决方案都可以。

最佳答案

说明似乎在 iframe 中。

您需要找到 ID 为 desc_ifriframe 并简单地向其 src 发出请求。

这应该做你想做的(未经测试):

requests.get(Nike_shoe.find(id="desc_ifr")["src"])

关于html - 是否可以通过请求和 BeautifulSoup 获取 eBay 商品描述?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63531321/

相关文章:

html - 如何使用 CSS3 使用数据属性值设置具有交替行颜色或图像的右 Angular 设计

html - 从 li ul 类中取出空间

macos - MacOS中鼠标光标移动加速和滚轮加速在哪里实现

python - 不在网络抓取中迭代列表

python - 我的脚本无法解析复杂网页中的项目

html - 使用 docraptor : How to flow header logo on every page and fix footer on the first page of pdf. 打印 css 媒体

javascript - 如何在字母范围内找到鼠标指针的位置

javascript - Ajax 响应在 10% 的时间内不起作用

javascript - 如何在 d3.js 转换中正确更新输入元素的文本值

html - 从 html 表中提取链接