python - 无法获取格式正确的字典输出

标签 python python-3.x dictionary web-scraping

我用 python 编写了一个抓取工具来解析网页中的一些数据。我的目的是将数据存储在字典中。我没有演示完整的表格，而是尝试使用包含单个玩家信息的单个 tr 。数据正在通过，但输出的格式不是字典的样子。任何使其准确的帮助都将受到高度赞赏。

这是我的尝试:

import requests
from bs4 import BeautifulSoup

URL = "https://fantasy.premierleague.com/player-list/"

def get_data(link):
    res = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
    soup = BeautifulSoup(res.text,"lxml")
    data = []
    for content in soup.select("div.ism-container"):
        itmval = {}
        itmval['name'] = content.select_one("h2").text
        itmval['player_info'] = [[item.get_text(strip=True) for item in items.select("td")] for items in content.select(" table:nth-of-type(1) tr:nth-of-type(2)")]
        data.append(itmval)

    print(data)

if __name__ == '__main__':
    get_data(URL)

我的输出:

[{'name': 'Goalkeepers', 'player_info': [['De Gea', 'Man Utd', '161', '£5.9']]}]

我期望的输出:

[{'name': 'Goalkeepers', 'player_info': ['De Gea', 'Man Utd', '161', '£5.9']}]

顺便说一句，我打算解析完整的表格，但我在这里展示了一个最小部分，以便您仔细观察。

最佳答案

如果你想使用嵌套列表理解，请尝试替换

[[item.get_text(strip=True) for item in items.select("td")] for items in content.select(" table:nth-of-type(1) tr:nth-of-type(2)")]

与

[item.get_text(strip=True) for items in content.select(" table:nth-of-type(1) tr:nth-of-type(2)") for item in items.select("td")]

关于python - 无法获取格式正确的字典输出，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50009153/

上一篇：python - 基于 numpy 索引和 TensorFlow 中的唯一运算符创建新张量

下一篇：python - emacs python Elpy 自动完成不适用于所有人

相关文章：

python - 获取当前目录中所有子目录的列表

python - 通过乘以索引标签在 Dataframe 上广播标量

python - 当密码包含时，FTP_TLS 530 使用 Python3 登录不正确 §

python - 有时 pip 更新，但有时不更新

java - java中分层数据表示的最佳数据结构是什么？

python - 使用 Python Selenium webdriver 登录雅虎电子邮件帐户

python - 如何使用OpenCV消除此图像中的噪点？

python - 如何在Python中使用requests上传文件

python - 将嵌套在两个字典下的列表转换为 DataFrame

python - 在 Python 中，如何根据键的频率编辑字典中的值？