python - 如何解析差距?

标签 python xpath python-3.x lxml

请帮助从 eBay 页面获取价格。

在下面的脚本中,我从两个特定页面获取价格。

import pprint
import requests
import lxml.etree
import lxml.html
import lxml.cssselect
import re


def get_doc(url):
    try:
        req = requests.get(url)
    except Exception:
        print('Error open. __', Exception)
    else:
        html = req.text
        doc = lxml.html.document_fromstring(html)
        return doc


for url in ['http://www.ebay.com/itm/DW-PDP-Concept-Pearlescent-White-Maple-Drumset-/121271668104?pt=US_Drums&hash=item1c3c5acd88', 'http://www.ebay.com/itm/LOT-OF-20-DRUM-SET-TUNING-KEYS-DW-TAMA-PEARL-SABIAN-and-OTHER-UNIQUE-KEYS-/291092068092?pt=US_Drums&hash=item43c67076fc']:
    doc = get_doc(url)
    title = doc.xpath('//h1[@id="itemTitle"]/text()')
    priceUSD = doc.xpath('//span[@itemprop="price"]/text()')
    print(title, priceUSD)

问题是第一页的价格有一个空格('&_n_b_s_p_;')。因此得到错误的xpath值text()。它看起来如下:

['DW/PDP Concept Pearlescent White Maple Drumset'] ['US $1\xa0200,00'] ['LOT OF 20 DRUM SET TUNING KEYS! DW! TAMA! PEARL! SABIAN! and OTHER UNIQUE KEYS!!'] ['US $6,05']

附: 它的价格不正确:'US $1\xa0200,00'

最佳答案

替换\xa0:

priceUSD = [t.replace('\xa0', '') for t in
            doc.xpath('//span[@itemprop="price"]/text()')]

顺便说一句,我得到以下输出,无需修改:

['DW/PDP Concept Pearlescent White Maple Drumset'] ['US $1,200.00']
['LOT OF 20 DRUM SET TUNING KEYS! DW! TAMA! PEARL! SABIAN! and OTHER UNIQUE KEYS!!'] ['US $6.05']

关于python - 如何解析差距?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22125654/

相关文章:

python - 防止主 while 循环在等待其他线程完成时阻塞 : python

python - Django 上传并显示多张图片

python - 如何转义 xpath 中的正斜杠?

php - 删除特定表 DOMXPath

python - 用 Pandas Series 中的元素填充 Pandas DataFrame 的对角线

python - 扩展 Jinja 的 {% trans %} 以使用 JavaScript 变量

python - 如何定义嵌套字典

python - 从 xpath 文本获取多个 Href

python - Pandas value_counts 返回同一值的多行

python - 如何使用 Pandas 打印两列的差异?