from lxml import html
import requests
page = requests.get('http://officequotes.net/no1-01.php')
tree = html.fromstring(page.content)
complete_script = tree.xpath('/html/body/table/tbody/tr[2]/td[2]')
print(complete_script)
我希望显示整个(电视节目)脚本,但我得到的只是一个空列表。
最佳答案
您可以跳过 tbody
并直接抓取表格:
from lxml import html
import requests
page = requests.get('http://officequotes.net/no1-01.php')
tree = html.fromstring(page.content)
complete_script = tree.xpath('//table/tr[2]/td[2]//text()')
#to strip the characters from xml
results = [esc.strip() for esc in complete_script]
remove={'',' '}
results= [rem for rem in results if rem not in remove]
print(results)
但我更喜欢 BeautifulSoup 来轻松提取相同的东西
from bs4 import BeautifulSoup
import requests
page = requests.get('http://officequotes.net/no1-01.php')
soup = BeautifulSoup(page.content,'lxml')
complete_script = soup.select('table > tr > td')[2].get_text()
print(complete_script)
关于python - 我正在尝试使用 Xpath 从电视节目中检索脚本,但它返回的是一个空列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57211807/