python - 使用 xpath 获取表中的最大值

我有一个通用文件格式的大型 html 菜单文件，我需要获取每个菜单项的最高价格。这是菜单文件的一部分的示例:

### File Name: "menu" (All types ".") ###
</div>
     <div class="menu-item-prices">
       <table>
        <tr>
            <td class="menu-item-price-amount">
                10
            </td>
            <td class="menu-item-price-amount">
                14
            </td>
        </tr>
</div>

</div>
     <div class="menu-item-prices">
       <table>
        <tr>
            <td class="menu-item-price-amount">
                100
            </td>
            <td class="menu-item-price-amount">
                1
            </td>
        </tr>
</div>

我需要我的程序返回每个菜单项中的最高价格列表，即本例中的 maxprices=['14','100'] 。我在Python中尝试过以下代码:

#!/user/bin/python

from lxml import html
from os.path import join, dirname, realpath
from lxml.etree import XPath

def main():
    """ Drive function """
    fpath = join(dirname(realpath(__file__)), 'menu')
    hfile = open(fpath)  # open html file
    tree = html.fromstring(hfile.read())

    prices_path = XPath('//*[@class="menu-item-prices"]/table/tr')  
    maxprices = []

    for p in prices_path(tree):
        prices = p.xpath('//td/text()')
        prices = [el.strip() for el in prices]
        maxprice = max(prices)
        maxprices.append(maxprice)
        print maxprices

if __name__ == '__main__':
    main()

我也尝试过

prices = tree.xpath('//*[@class="menu-item-prices"]'
                    '//tr[not(../tr/td > td)]/text()')
prices = [el.strip() for el in prices]

代替上面的循环策略。两者都不会返回每个类别所需的最高价格。如何修改我的代码以正确获取这些价格？谢谢。

最佳答案

至少存在 1 个问题 - 您比较字符串，但需要将价格转换为 float ，然后获取每个表行的最大值。

完整示例:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
from lxml.html import fromstring

data = """
<div>
     <div class="menu-item-prices">
       <table>
            <tr>
                <td class="menu-item-price-amount">
                    10
                </td>
                <td class="menu-item-price-amount">
                    14
                </td>
            </tr>
        </table>
    </div>

    <div class="menu-item-prices">
       <table>
        <tr>
            <td class="menu-item-price-amount">
                100
            </td>
            <td class="menu-item-price-amount">
                1
            </td>
        </tr>
        </table>
    </div>
</div>
"""

tree = fromstring(data)
for item in tree.xpath("//div[@class='menu-item-prices']/table/tr"):
    prices = [float(price.strip()) for price in item.xpath(".//td[@class='menu-item-price-amount']/text()")]
    print(max(prices))

打印:

14.0
100.0

关于python - 使用 xpath 获取表中的最大值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35464762/

python - 使用 xpath 获取表中的最大值

上一篇：python - 如何使用python读取文件？

下一篇：python - 如何访问本地网络上运行的服务器