python beautifulsoup 字典表与列表

标签 python html list dictionary beautifulsoup

我正在尝试创建一个带有键值的字典表,以便稍后加入与列表关联的键值。下面是代码,其中包含代码生成的输出以及所需的输出。有人可以帮助我以列表形式在字典中实现所需的输出吗?注意,第二组没有链接,当发生类似情况时可以在此处放置一个值,例如“无”吗?

import requests
from bs4 import BeautifulSoup
from collections import defaultdict

html='<tr><td align="right">1</td><td align="left"><a href="http://www.tennisabstract.com/cgi-bin/wplayer.cgi?p=VictoriaAzarenka">Victoria Azarenka</a></td><td align="left">BLR</td><td align="left">1989-07-31</td></tr> <tr><td align="right">1146</td><td align="left">Brittany Lashway</td><td align="left">USA</td><td align="left">1994-04-06</td></tr>'

soup = BeautifulSoup(html,'lxml')

for cell in soup.find_all('td'):
    if cell.find('a', href=True):
        print(cell.find('a', href=True).attrs['href'])
        print(cell.find('a', href=True).text)
    else:
        print(cell.text)

'''
Output From Code:
1 --> Rank
http://www.tennisabstract.com/cgi-bin/wplayer.cgi?p=VictoriaAzarenka --> Website
Victoria Azarenka --> Name
BLR --> Country
1989-07-31 --> Birth Date
1146 --> Rank
Brittany Lashway --> Name
USA --> Country
1994-04-06 --> Birth Date

Desired Output: (Dictionary Table with List component)

{Key, [Rank, Website,Name, Country, Birth Date]}
Example:
{1, [1, http://www.tennisabstract.com/cgi-bin/wplayer.cgi?p=VictoriaAzarenka, Victoria Azarenka, BLR, 1989-07-31]}
{2, [1146, None, Brittany Lashway, USA, 1994-04-06]}
'''

最佳答案

您可以使用列表和字典理解来执行类似的操作:

from bs4 import BeautifulSoup as bs

html='<tr><td align="right">1</td><td align="left"><a href="http://www.tennisabstract.com/cgi-bin/wplayer.cgi?p=VictoriaAzarenka">Victoria Azarenka</a></td><td align="left">BLR</td><td align="left">1989-07-31</td></tr> <tr><td align="right">1146</td><td align="left">Brittany Lashway</td><td align="left">USA</td><td align="left">1994-04-06</td></tr>'

# Genrator to find the desired text and links
def find_link_or_text(a):
    for cell in a:
        if cell.find('a', href=True):
            yield cell.find('a', href=True).attrs['href']
            yield cell.find('a', href=True).text
        else:
            yield cell.text

# Parse data using BeautifulSoup
data = bs(html, 'lxml')
# Retrurn only a parsed data within td tag
parsed = data.find_all('td')

# Group elements by 5
sub = [list(find_link_or_text(parsed[k:k+4])) for k in range(0, len(parsed), 4)]

# put the sub dict within a key from 1 to len(sub)+1
final = {key: value for key, value in zip(range(1, len(sub) +1), sub)}
print(final)

输出:

{1: ['1', 'http://www.tennisabstract.com/cgi-bin/wplayer.cgi?p=VictoriaAzarenka', 'Victoria Azarenka', 'BLR', '1989-07-31'], 2: ['1146', 'Brittany Lashway', 'USA', '1994-04-06']}

关于python beautifulsoup 字典表与列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44190212/

相关文章:

python - 产品变体未反射(reflect) Django 电子商务项目中订单摘要中的更新数量

javascript - 如何使 jQuery 向上和向下箭头按钮逐步垂直滚动正文?

php - 如何重写我的 PHP 和 MySQL 以按相等的列值对我的 HTML 列表进行分组?

python - 将 Python 字典键分组为列表,并以此列表为值创建一个新字典

python - 如何从 Python 中的十六进制字符串中删除 '\x'?

python - 顶点列表中的边 - gremlin python

python - 如何使箭头变细 matplotlib

css - 如何内联 :before element when unable to include a CSS/style section?

c# - 基于键 c# 合并两个列表

java - 如何使用子类或父类(super class)定义 Java 函数的返回类型和参数?