python - 刮削表格底行

标签 python html python-3.x beautifulsoup

我使用的是Python 3.4。我知道如何利用BeautifulSoup抓取网页,但我正在尝试想出最有效的方法来完成此任务。 Nexus factory image page (Android) 包含所有 Nexus 设备的列表,并在新版本可用时更新。最新版本始终添加到相应表格的底部。我有每个设备的名称列表,包括真实名称和代号,我只提取这些(设备本身每年只更新一次,如果是的话,只有部分设备仍然收到更新)。

从每个表中提取底部条目的最有效方法是什么?我计划保存第一个 <td> 中的每个字符串。在底部行中作为腌制对象,这样我以后就可以轻松地比较字符串以检查当前底部行是否是新的,但我不确定抓取条目本身的最佳方法是什么。

每个<tr>其 ID 格式为 devnamebuildnumber 。由于我有每个设备的名称并且将有最新的字符串,因此我应该能够使用 soup.find("tr", id=dev + buildstring) 进行搜索。然而,这会返回找到的行的每个同级和子级,因此我不确定如何最好地利用它。

最佳答案

这里有一些可以帮助您入门的东西。这个想法是获取带有 id 属性的 h2 元素 - 除了第一个元素之外,这些元素是设备名称元素。对于找到的每个元素,让我们获取下一个 table 元素并将版本解析到列表中。实现:

from pprint import pprint

import requests
from bs4 import BeautifulSoup


url = "https://developers.google.com/android/nexus/images"
response = requests.get(url)

soup = BeautifulSoup(response.content, "lxml")

data = {}
for device in soup.find_all("h2", id=True)[1:]:
    device_name = device.get_text(strip=True)

    data[device_name] = [version.find("td").get_text(strip=True)
                         for version in device.find_next("table").find_all("tr", id=True)]

pprint(data)

打印字典,其中设备名称作为键,版本作为值:

{'"angler" for Nexus 6P': ['6.0.0 (MDA89D)',
                           '6.0.0 (MDB08K)',
                           '6.0.0 (MDB08L)',
                           '6.0.0 (MDB08M)',
                           '6.0.0 (MMB29N)',
                           '6.0.1 (MMB29M)',
                           '6.0.1 (MMB29P)'],
 '"bullhead" for Nexus 5X': ['6.0.0 (MDA89E)',
                             '6.0.0 (MDB08I)',
                             '6.0.0 (MDB08L)',
                             '6.0.0 (MDB08M)',
                             '6.0.1 (MMB29K)',
                             '6.0.1 (MMB29P)'],
 '"fugu" for Nexus Player': ['5.0 (LRX21M)',
                             '5.0 (LRX21V)',
                             '5.1.0 (LMY47D)',
                             '5.1.1 (LMY47V)',
                             '5.1.1 (LMY48J)',
                             '5.1.1 (LMY48N)',
                             '6.0.0 (MRA58K)',
                             '6.0.0 (MRA58N)',
                             '6.0.1 (MMB29M)',
                             '6.0.1 (MMB29T)'],
 '"hammerhead" for Nexus 5 (GSM/LTE)': ['4.4 (KRT16M)',
                                        '4.4.2 (KOT49H)',
                                        '4.4.3 (KTU84M)',
                                        '4.4.4 (KTU84P)',
                                        '4.4.4 Release 2 (For 2Degrees/NZ, '
                                        'Telstra/AUS and India ONLY) (KTU84Q)',
                                        '5.0 (LRX21O)',
                                        '5.0.1 (LRX22C)',
                                        '5.1.0 (LMY47D)',
                                        '5.1.0 (LMY47I)',
                                        '5.1.1 (LMY48B)',
                                        '5.1.1 (LMY48I)',
                                        '5.1.1 (LMY48M)',
                                        '6.0.0 (MRA58K)',
                                        '6.0.0 (MRA58N)',
                                        '6.0.1 (MMB29K)',
                                        '6.0.1 (MMB29S)'],
 '"mantaray" for Nexus 10': ['4.2.2 (JDQ39)',
                             '4.3 (JWR66Y)',
                             '4.4 (KRT16S)',
                             '4.4.2 (KOT49H)',
                             '4.4.3 (KTU84L)',
                             '4.4.4 (KTU84P)',
                             '5.0 (LRX21P)',
                             '5.0.1 (LRX22C)',
                             '5.0.2 (LRX22G)',
                             '5.1.0 (LMY47D)',
                             '5.1.1 (LMY47V)',
                             '5.1.1 (LMY48I)',
                             '5.1.1 (LMY48M)',
                             '5.1.1 (LMY48T)',
                             '5.1.1 (LMY48X)',
                             '5.1.1 (LMY48Z)',
                             '5.1.1 (LMY49F)'],
 '"mysid" for Galaxy Nexus "toro" (Verizon CDMA/LTE)': ['4.0.4 (IMM76K)',
                                                        '4.1.1 (JRO03O)',
                                                        '4.2.2 (JDQ39)'],
 '"mysidspr" for Galaxy Nexus "toroplus" (Sprint CDMA/LTE)': ['4.1.1 (FH05)',
                                                              '4.2.1 (GA02)'],
 '"nakasi" for Nexus 7 (Wi-Fi)': ['4.1.2 (JZO54K)',
                                  '4.2.2 (JDQ39)',
                                  '4.3 (JWR66Y)',
                                  '4.4 (KRT16S)',
                                  '4.4.2 (KOT49H)',
                                  '4.4.3 (KTU84L)',
                                  '4.4.4 (KTU84P)',
                                  '5.0 (LRX21P)',
                                  '5.0.2 (LRX22G)',
                                  '5.1.0 (LMY47D)',
                                  '5.1.1 (LMY47V)'],
 '"nakasig" for Nexus 7 (Mobile)': ['4.2.2 (JDQ39)',
                                    '4.3 (JWR66Y)',
                                    '4.4 (KRT16S)',
                                    '4.4.2 (KOT49H)',
                                    '4.4.3 (KTU84L)',
                                    '4.4.4 (KTU84P)',
                                    '5.0.2 (LRX22G)',
                                    '5.1.0 (LMY47D)',
                                    '5.1.1 (LMY47V)'],
 '"occam" for Nexus 4': ['4.2.2 (JDQ39)',
                         '4.3 (JWR66Y)',
                         '4.4 (KRT16S)',
                         '4.4.2 (KOT49H)',
                         '4.4.3 (KTU84L)',
                         '4.4.4 (KTU84P)',
                         '5.0 (LRX21T)',
                         '5.0.1 (LRX22C)',
                         '5.1.0 (LMY47O)',
                         '5.1.1 (LMY47V)',
                         '5.1.1 (LMY48I)',
                         '5.1.1 (LMY48M)',
                         '5.1.1 (LMY48T)'],
 '"razor" for Nexus 7 [2013] (Wi-Fi)': ['4.3 (JSS15Q)',
                                        '4.3 (JSS15R)',
                                        '4.4 (KRT16S)',
                                        '4.4.2 (KOT49H)',
                                        '4.4.3 (KTU84L)',
                                        '4.4.4 (KTU84P)',
                                        '5.0 (LRX21P)',
                                        '5.0.1 (LRX22C)',
                                        '5.0.2 (LRX22G)',
                                        '5.1.0 (LMY47O)',
                                        '5.1.1 (LMY47V)',
                                        '5.1.1 (LMY48G)',
                                        '5.1.1 (LMY48I)',
                                        '5.1.1 (LMY48M)',
                                        '5.1.1 (LMY48T)',
                                        '6.0.0 (MRA58K)',
                                        '6.0.0 (MRA58U)',
                                        '6.0.0 (MRA58V)',
                                        '6.0.1 (MMB29K)',
                                        '6.0.1 (MMB29O)'],
 '"razorg" for Nexus 7 [2013] (Mobile)': ['4.3 (JLS36C)',
                                          '4.3.1 (JLS36I)',
                                          '4.4 (KRT16S)',
                                          '4.4.2 (KOT49H)',
                                          '4.4.2_r2 (Verizon) (KVT49L)',
                                          '4.4.3 (KTU84L)',
                                          '4.4.4 (KTU84P)',
                                          '5.0.2 (LRX22G)',
                                          '5.1.0 (LMY47O)',
                                          '5.1.1 (LMY47V)',
                                          '5.1.1 (LMY48P)',
                                          '5.1.1 (LMY48U)',
                                          '5.1.1 (LMY48X)',
                                          '5.1.1 (LMY48Z)',
                                          '6.0.0 (MRA58K)',
                                          '6.0.0 (MRA58N)',
                                          '6.0.0 (MRA58V)',
                                          '6.0.0 (MRA59B)',
                                          '6.0.1 (MMB29K)',
                                          '6.0.1 (MMB29O)'],
 '"ryu" for Pixel C': ['6.0.1 (MXB48J)', '6.0.1 (MXB48K)'],
 '"shamu" for Nexus 6': ['5.0 (LRX21O)',
                         '5.0.1 (LRX22C)',
                         '5.1.0 (LMY47D)',
                         '5.1.0 (LMY47E)',
                         '5.1.0 (LMY47I)',
                         '5.1.0 (For T-Mobile ONLY) (LMY47M)',
                         '5.1.1 (All carriers except T-Mobile US) (LMY47Z)',
                         '5.1.1 (For T-Mobile ONLY) (LYZ28E)',
                         '5.1.1 (For Project Fi ONLY) (LVY48C)',
                         '5.1.1 (LMY48I)',
                         '5.1.1 (For T-Mobile ONLY) (LYZ28J)',
                         '5.1.1 (For Project Fi ONLY) (LVY48E)',
                         '5.1.1 (LMY48M)',
                         '5.1.1 (For T-Mobile ONLY) (LYZ28K)',
                         '5.1.1 (For Project Fi ONLY) (LVY48F)',
                         '5.1.1 (LMY48T)',
                         '5.1.1 (For T-Mobile ONLY) (LYZ28M)',
                         '5.1.1 (For Project Fi ONLY) (LVY48H)',
                         '5.1.1 (LMY48W)',
                         '5.1.1 (LMY48X)',
                         '5.1.1 (LMY48Y)',
                         '5.1.1 (For T-Mobile ONLY) (LYZ28N)',
                         '5.1.1 (For Project Fi ONLY) (LVY48I)',
                         '6.0.0 (MRA58K)',
                         '6.0.0 (MRA58N)',
                         '6.0.0 (MRA58R)',
                         '6.0.0 (MRA58X)',
                         '6.0.1 (MMB29K)',
                         '6.0.1 (MMB29S)'],
 '"soju" for Nexus S (worldwide version, i9020t and i9023)': ['2.3.6 (GRK39F)',
                                                              '4.0.4 (IMM76D)',
                                                              '4.1.2 (JZO54K)'],
 '"sojua" for Nexus S (850MHz version, i9020a)': ['2.3.6 (GRK39F)',
                                                  '4.0.4 (IMM76D)',
                                                  '4.1.2 (JZO54K)'],
 '"sojuk" for Nexus S (Korea version, m200)': ['2.3.6 (GRK39F)',
                                               '4.0.4 (IMM76D)',
                                               '4.1.1 (JRO03E)'],
 '"sojus" for Nexus S 4G (d720)': ['2.3.7 (GWK74)',
                                   '4.0.4 (IMM76D)',
                                   '4.1.1 (JRO03R)'],
 '"takju" for Galaxy Nexus "maguro" (GSM/HSPA+) (with Google Wallet)': ['4.0.4 '
                                                                        '(IMM76I)',
                                                                        '4.1.2 '
                                                                        '(JZO54K)',
                                                                        '4.2.2 '
                                                                        '(JDQ39)',
                                                                        '4.3 '
                                                                        '(JWR66Y)'],
 '"tungsten" for Nexus Q': ['4.0.4 (IAN67K)'],
 '"volantis" for Nexus 9 (Wi-Fi)': ['5.0 (LRX21Q)',
                                    '5.0 (LRX21R)',
                                    '5.0.1 (LRX22C)',
                                    '5.0.2 (LRX22L)',
                                    '5.1.1 (LMY47X)',
                                    '5.1.1 (LMY48I)',
                                    '5.1.1 (LMY48M)',
                                    '5.1.1 (LMY48T)',
                                    '6.0.0 (MRA58K)',
                                    '6.0.0 (MRA58N)',
                                    '6.0.1 (MMB29K)',
                                    '6.0.1 (MMB29S)'],
 '"volantisg" for Nexus 9 (LTE)': ['5.0.1 (LRX22C)',
                                   '5.0.2 (LRX22L)',
                                   '5.1.1 (LMY47X)',
                                   '5.1.1 (LMY48I)',
                                   '5.1.1 (LMY48M)',
                                   '5.1.1 (LMY48T)',
                                   '5.1.1 (LMY48X)',
                                   '5.1.1 (LMY48Z)',
                                   '5.1.1 (LMY49F)',
                                   '6.0.0 (MRA58K)',
                                   '6.0.0 (MRA58N)',
                                   '6.0.1 (MMB29K)',
                                   '6.0.1 (MMB29S)'],
 '"yakju" for Galaxy Nexus "maguro" (GSM/HSPA+)': ['4.0.4 (IMM76I)',
                                                   '4.1.2 (JZO54K)',
                                                   '4.2.2 (JDQ39)',
                                                   '4.3 (JWR66Y)']}

关于python - 刮削表格底行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34998901/

相关文章:

python - 类型错误 : 'bool' object is not subscriptable Python 3

python - ValueError : If using all scalar values, 你必须传递一个索引

python - 有人有通过 Data Flow Python SDK 将输出写入 SFTP 的经验吗?

HTML5 : What should server do when user uploads folder?

html - 背景图像不会显示

Python:尝试创建两个变量上下文管理器

python - 如何使用pyautogui打印出 'Live'鼠标位置坐标?

python numpy - 有更快的卷积方法吗?

python - 每次用户按键时如何录制音频?

CSS div 不会随内容展开,而是提供滚动条?