Python:xpath 稳健地获取表中的数据

标签 python xml python-2.7 xpath lxml

我只想在 http://www.images.watoday.com.au/business/markets/movers 中获取“WHOLE MARKET TOP GAINERS”的数据。

我的代码如下:

import requests
from lxml import html

page_gain = requests.get('http://www.images.watoday.com.au/business/markets/movers')
gain = html.fromstring(page_gain.content)
name = gain.xpath('//h2[contains(.,"Whole Market Top Gainers")]/following::a/text()')
data = gain.xpath('//h2[contains(.,"Whole Market Top Gainers")]/following::td/text()')

我想要的输出是

['MEM','MEMPHASYS LTD','0.002','0.001','100.00','1,000,000','AUH','AUSTCHINA HOLDINGS','0.007','0.002','40.00','1,433,311'....] 

最佳答案

如何限制第一个表格(following::table[1])中文本Whole Market Top Gainers 之后的行:

>>> gain = ...
>>> expr = ('//h2[contains(.,"Whole Market Top Gainers")]'
            '/following::table[1]/tbody/tr')
>>> rows = gain.xpath()
>>> [[td.text_content().strip() for td in row] for row in rows]
[['AJC', 'ACACIA COAL LTD', '0.002', '0.001rise', '100.00rise', '92,525'],
 ['MEM', 'MEMPHASYS LTD', '0.002', '0.001rise', '100.00rise', '1,000,000'],
 ['AUH', 'AUSTCHINA HOLDINGS', '0.007', '0.002rise', '40.00rise', '1,433,311'],
 ['AO1', 'ASSETOWL LIMITED', '0.100', '0.025rise', '33.33rise', '249,180'],
 ['BAS', 'BASS OIL LTD', '0.004', '0.001rise', '33.33rise', '15,390,472'],
 ['RNL', 'RISION LIMITED', '0.004', '0.001rise', '33.33rise', '6,100,812'],
 ['PAB', 'PATRYS LIMITED', '0.061', '0.013rise', '27.08rise', '86,337,514'],
 ['IQ3', 'IQ3CORP LIMITED', '0.250', '0.050rise', '25.00rise', '6,000'],
 ['SMA', 'SMARTTRANS HOLDINGS', '0.005', '0.001rise', '25.00rise', '70,000'],
 ['SEI', 'SPECIALITY METALINT', '0.035', '0.006rise', '20.69rise', '12,162,844']]
# td.text_content().strip().replace('rise', '') to remove `rise`

关于Python:xpath 稳健地获取表中的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49143689/

相关文章:

python - 如何通过脚本更新 plone 页面?

java - 显示 AVD 的日志

应用于变量的 Python 冒号 ":"运算符?

python - Python - PySide 和 Qt 库之间的链接类型?

python - 如何在gensim中使用TaggedDocument?

sql-server - 从 SQL Server 中的多个 XML 列查询

python - 运行时错误 : Too many failed attempts to build model. keras 调谐器

python - 如何使 ConfigParser 返回默认值而不是引发 NoOptionError?

python - Django 模板和 render() - 如何访问多个值

Javascript:从 XML 文件获取值