我已经尝试了我所知道的一切,但似乎找不到解决方案。
import csv
import requests
from lxml import html
from itertools import izip
list_names_atp = []
page = requests.get('http://www.atpworldtour.com/en/rankings/singles')
tree = html.fromstring(page.content)
list_rank_atp = []
for i in range(0,101):
result = tree.xpath('//*[@id="rankingDetailAjaxContainer"]/table/tbody/tr[%s]/td[1]/text()'%(i))
list_rank_atp.append(result)
list_names_atp = []
for i in range(0,101):
result1 = tree.xpath('//*[@id="rankingDetailAjaxContainer"]/table/tbody/tr[%s]/td[4]/a/text()'%(i))
list_names_atp.append(result1)
list_Final =[]
for i in izip(list_rank_atp, list_names_atp):
uitkom = i
list_Final.append(uitkom)
outfile = open("./tennis.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Rank", "Name"])
writer.writerows(list_Final)
csv 输出如下:
但我希望它是:
最佳答案
以下是一些注释:
XPath 索引从
1
开始,而不是从0
开始。这就是为什么第一个数据行的条目为空。您可以使用 Python 的
strip()
或 XPath 的normalize-space()
删除行号文本周围的空格
我建议迭代行(tr
)并在每次迭代中从当前行获取所需的所有信息:
page = requests.get('http://www.atpworldtour.com/en/rankings/singles')
tree = html.fromstring(page.content)
outfile = open("./tennis.csv", "wb")
writer = csv.writer(outfile)
rows = tree.xpath('//*[@id="rankingDetailAjaxContainer"]/table/tbody/tr')
writer.writerow(["Rank", "Name"])
for row in rows:
no = row.xpath('td[1]/text()')[0].strip()
name = row.xpath('td[4]/a/text()')[0]
writer.writerow([no, name])
outfile.close()
关于python - XPath 删除列表 Python 中的空格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36810583/