python - 如何用python和beautifulsoup解析html表格并写入csv

我尝试解析 html 页面并获取货币值并写入 csv。我有以下代码:

#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div', attrs={'class': 'content'})

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True) + ';'
        print text,
    print

问题是，我不知道如何只检索货币值。我尝试了一些像“^[0-9]{3}”这样的正则表达式 - 以 3 位数字开头，但它不起作用。

最佳答案

您最好在表格中挑选出特定的单元格。 cell_c 类的 td 单元格包含您感兴趣的数据，最后一个始终是货币汇率:

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    if 'cell_c' in cols[0]['class']:
        # currency row
        digital_code, letter_code, units, name, rate = [c.text for c in cols]
        print digital_code, letter_code, units, name, rate

有了单独变量中的数据，您现在可以将文本转换为十进制数字，将它们存储在数据库中，等等。

关于python - 如何用python和beautifulsoup解析html表格并写入csv，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15250455/

上一篇：python - 用python检测英文文本

下一篇：c# - Python 的 OrderedDict 的模拟？

python - 无法安装 Beautifulsoup ("bs4 does not exist")

python - 尝试使用 selenium Python 循环搜索查询时在 find_element_by_partial_link_text() 中出错

python - 为什么 multiprocessing.Pool.map 与我的自定义部分函数一起挂起？

python - 如何编辑 'formdata'来爬取Ajax动态页面？

c# - 将返回 List<string> 的 C# 函数导入到 Python

python - 如何从标签内获取文本，但忽略其他子标签

python - 解析 Youtube 播放列表的 HTML

python - 如何对从 beautifulsoup 抓取的 html 中的列表元素进行排序？

python - 如何将一个 N 长度的 numpy 数组附加到另一个 N 维数组？