尝试使用Scrapy抓取以下网页,https://www2.trollandtoad.com/buylist/?_ga=2.123753418.115346513.1562026676-1813285172.1559913561#!/M/10591 ,并且我正确抓取了部分数据,但无法正确抓取卡名称,因为它的选择器与集合名称相同,所以我也只获取卡名称的集合名称。
def parse(self, response):
# Initialize item to function GameItem located in items.py, will be called multiple times
item = GameItem()
# Extract card category from URL using html code from website that identifies the category. Will be outputted before rest of data
for data in response.css("tr.ng-scope"):
item["Set"] =data.css("a.ng-binding.ng-scope::text").get()
if item["Set"] == None:
item["Set"] = data.css("span.ng-binding.ng-scope::text").get()
item["Card_Name"] = data.css("a.ng-binding.ng-scope::text").get()
# Call item again in order to extract the condition, stock, and price using the corresponding html code from the website
item["Condition"] = data.css("td\.5557170.buylist_condition::text").get()
item["Quantity"] = data.css("span.ng-binding::text").get()
item["Price"] = data.css("span.ng-binding::text").get()
更新#1
我使用 xpath 代替,并且能够获取卡名称而不是设置名称,但它为每一行返回相同的卡名称,而不是不同的卡名称。
item["Card_Name"] = data.xpath("/html/body/div[2]/div[2]/div[1]/table[1]/tbody/tr[1]/td[2]/a/text()").get()
最佳答案
card_names = response.xpath("//div/table/tbody/tr/td[contains(@class,'buylist_productname item')]/a/text()").getall()
将根据页面中的顺序返回不同卡片名称的列表。
关于python - 数据没有被正确抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56858673/