python-3.x - Beautifulsoup 捕获了名字而不是网页的元分数

标签 python-3.x beautifulsoup

我得到了我想要的名字,但没有用这段代码得到相应的 Metascore:

from requests import get
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

# Define the URL
url = "http://www.metacritic.com/browse/games/score/metascore/year/pc/filtered?sort=desc&year_selected=2018"

# not sure about this but it works (I was getting blocked by something and this the way I found around it)
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

web_byte = urlopen(req).read()

webpage = web_byte.decode('utf-8')

#this grabs the all the text from the page
html_soup = BeautifulSoup(webpage, 'html5lib')

#this is for selecting all the games in from 1 to 100 (the list of them)
game_containers = html_soup.find_all("div", class_="product_item product_title")

# print(game_containers)

game_names = html_soup.find_all("div", class_="product_item product_title")
game_metascores_p = html_soup.find_all("div", class_="metascore_w small game positive")
game_metascores_m = html_soup.find_all("div", class_="metascore_w small game mixed")
game_user_s = html_soup.find_all("span", class_="data textscore textscore_favorable")

#lists to store the data
names = []
metascores = []
userscores = []

#Extract data from each game
for games in game_names:

    name = games.find()
    names.append(name.text.strip())

    metascore = games.find_next_sibling.()
    metascores.append(metascore.text.strip())

当我运行游戏名称时:

print(names)

我得到了 100 个名字的列表,只是字符串(这就是我想要的)

当我运行这个时:

print(metascores)

我明白了:

['User:\n    7.6', 'User:\n    7.8', 'User:\n    7.0', 'User:\n    8.2', 'User:\n    7.3', 'User:\n    5.9', 'User:\n    7.2', 'User:\n    7.8', 'User:\n    8.1', 'User:\n    7.0', 'User:\n    8.5', 'User:\n    6.6', 'User:\n    7.2', 'User:\n    7.2', 'User:\n    7.3', 'User:\n    7.2', 'User:\n    7.5', 'User:\n    6.5', 'User:\n    7.5', 'User:\n    7.9', 'User:\n    7.8', 'User:\n    7.2', 'User:\n    7.6', 'User:\n    tbd', 'User:\n    7.9', 'User:\n    7.1', 'User:\n    6.1', 'User:\n    6.0', 'User:\n    tbd', 'User:\n    7.1', 'User:\n    6.6', 'User:\n    8.0', 'User:\n    7.7', 'User:\n    tbd', 'User:\n    7.5', 'User:\n    tbd', 'User:\n    8.1', 'User:\n    7.8', 'User:\n    7.7', 'User:\n    tbd', 'User:\n    7.9', 'User:\n    tbd', 'User:\n    5.4', 'User:\n    8.0', 'User:\n    tbd', 'User:\n    7.7', 'User:\n    8.0', 'User:\n    6.3', 'User:\n    8.0', 'User:\n    6.2', 'User:\n    8.3', 'User:\n    8.2', 'User:\n    8.3', 'User:\n    8.1', 'User:\n    5.1', 'User:\n    6.5', 'User:\n    7.5', 'User:\n    7.3', 'User:\n    6.7', 'User:\n    7.9', 'User:\n    tbd', 'User:\n    tbd', 'User:\n    7.2', 'User:\n    tbd', 'User:\n    tbd', 'User:\n    6.9', 'User:\n    5.4', 'User:\n    6.9', 'User:\n    tbd', 'User:\n    6.6', 'User:\n    7.9', 'User:\n    4.0', 'User:\n    6.8', 'User:\n    tbd', 'User:\n    6.1', 'User:\n    4.5', 'User:\n    6.2', 'User:\n    8.3', 'User:\n    4.5', 'User:\n    4.9', 'User:\n    7.7', 'User:\n    4.7', 'User:\n    7.9', 'User:\n    tbd', 'User:\n    tbd', 'User:\n    tbd', 'User:\n    6.9', 'User:\n    6.0', 'User:\n    tbd', 'User:\n    tbd', 'User:\n    tbd', 'User:\n    tbd', 'User:\n    4.6', 'User:\n    7.3', 'User:\n    tbd', 'User:\n    7.5', 'User:\n    6.8', 'User:\n    6.4', 'User:\n    tbd', 'User:\n    4.1']

这是用户分数(在下一个将是用户分数的变量上,我想只获取不包括“'User:\n'”的数字或待定)

那么我如何获得元分数和用户分数(只是字符串)?

最佳答案

您可以使用replace():

str.replace("User:\n    ", "")

像这样:

metascoresNew = []
for i in metascores:
    temp = str(i)
    temp2 = temp.replace("User:\n    ", "")
    metascoresNew.append(temp2)
print(metascoresNew)

输出将是:

['7.6', '7.8', '7.0', '8.2'...]

演示 here

关于python-3.x - Beautifulsoup 捕获了名字而不是网页的元分数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50891072/

相关文章:

python - 从 difflib 中获取更细粒度的差异(或者通过后处理差异来实现相同目的的方法)

python-3.x - 单一 View 中的多模型表单集工厂

linux - 如何为Redhat 5.8/CentOS 5编译python3.6程序?

python - BeautifulSoup 和转换 HTML 实体的奇怪行为

javascript - 使用 selenium 和 beautifulsoup 进行网页抓取时,通过 id、class、xpath、css 选择器查找元素不会返回任何内容

python - 使用 Beautiful Soup 获取所有 HTML 标签

python - 在python中使用RegEx用下面的模式替换某个字符

c# - 在 python 中使用 key 应用 HMAC SHA-512 算法

Django - 无法获取 Highcharts 来显示数据

python - 如何使用 BeautifulSoup 从 SEC N-Q 文档中提取表