python - Pyplot 分散名称未定义

标签 python pandas matplotlib scatter

我已经从网页上抓取了数据,现在我想可视化该数据。当我尝试分散时,我在 plt.scatter(data[x],data[y]) 处收到错误“NameError:名称'x'未定义”。我尝试查看从网站上抓取的代码和数据,并查看了我自己的代码。不确定为什么 xy 不起作用。有什么解决办法吗?

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from mplsoccer.pitch import Pitch
from pandas.core.indexes.base import Index 

text_color = 'w'

data = pd.read_csv(#filename)

fig, ax = plt.subplots(figsize=(13,8.5)) #lager figurene

fig.set_facecolor('#22312b')

ax.patch.set_facecolor('#22312b')

pitch = Pitch(pitch_color='#aabb97', line_color='white')

pitch.draw(ax=ax)

plt.scatter(data[x],data[y])

我从中读取数据的 csv 文件是这样的:

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

base_url = 'https://understat.com/match/'
match = input('Please enter the match id: ')
url = base_url + match

res = requests.get(url)
soup = BeautifulSoup(res.content, 'lxml')
scripts = soup.find_all('script')

strings = scripts[1].string

ind_start = strings.index("('")+2

ind_end = strings.index("')")

json_data = strings[ind_start:ind_end]
json_data = json_data.encode('utf8').decode('unicode_escape')

data = json.loads(json_data)

team = []
minute = []
xg = []
result = []
x = []
y = []
situation = []
player = []

data_away = data['a']
data_home = data['h']

for index in range(len(data_home)):
    for key in data_home[index]:
        if key == 'X':
            x.append(data_home[index][key])
        if key == 'Y':
            y.append(data_home[index][key])
        if key == 'xG':
            xg.append(data_home[index][key])
        if key == 'h_team':
            team.append(data_home[index][key])
        if key == 'result':
            result.append(data_home[index][key])
        if key == 'situation':
            situation.append(data_home[index][key])
        if key == 'minute':
            minute.append(data_home[index][key])
        if key == 'player':
            player.append(data_home[index][key])

for index in range(len(data_away)):
    for key in data_away[index]:
        if key == 'X':
            x.append(data_away[index][key])
        if key == 'Y':
            y.append(data_away[index][key])
        if key == 'xG':
            xg.append(data_away[index][key])
        if key == 'a_team':
            team.append(data_away[index][key])
        if key == 'result':
            result.append(data_away[index][key])
        if key == 'situation':
            situation.append(data_away[index][key])
        if key == 'minute':
            minute.append(data_away[index][key])
        if key == 'player':
            player.append(data_away[index][key])

col_names = ['Minute','Player','Situation','Team','xG','Result','x-coordinate','y-coordinate']
df = pd.DataFrame([minute,player,situation,team,xg,result,x,y], index=col_names)
df.to_csv('shotmaps.csv', encoding='utf-8')
df = df.T

这是我的数据框

         Unnamed: 0                    0                    1  ...                    30                    31                   32
0        Minute                    8                   10  ...                    78                    79                   86
1        Player    Cristiano Ronaldo    Cristiano Ronaldo  ...   Allan Saint-Maximin           Joe Willock            Joelinton
2     Situation             OpenPlay             OpenPlay  ...              OpenPlay            FromCorner             OpenPlay
3          Team    Manchester United    Manchester United  ...      Newcastle United      Newcastle United     Newcastle United
4            xG  0.05710771679878235  0.03967716544866562  ...  0.020885728299617767  0.013165773823857307  0.05987533554434776
5        Result          MissedShots          MissedShots  ...           BlockedShot             SavedShot          MissedShots
6  x-coordinate   0.9780000305175781   0.9719999694824218  ...    0.7390000152587891     0.705999984741211   0.9119999694824219
7  y-coordinate  0.33799999237060546                 0.72  ...   0.47900001525878905    0.4640000152587891   0.5929999923706055

错误消息

File "C:\Users\#name\AppData\Local\Programs\PythonCodingPack\lib\site-packages\pandas\core\indexes\base.py", line 2889, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'x-coordinate'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "#filename", line 19, in <module>
    plt.scatter(data['x-coordinate'],data['y-coordinate'])
  File "#filename", line 2899, in __getitem__
    indexer = self.columns.get_loc(key)
  File "#filename", line 2891, in get_loc
    raise KeyError(key) from err
KeyError: 'x-coordinate'

最佳答案

仅仅因为变量是在您运行的文件中定义的,并不意味着它在您稍后运行的另一个文件中自动可用。您需要以某种方式传递它们,例如导入第二个文件并调用第一个文件中的函数,该函数返回您要查找的值。

但是,这个特定问题的解决方案要容易得多。在您的绘图仪文件中,只需更改

plt.scatter(data[x],data[y])

plt.scatter(data["x-coordinate"],data["y-coordinate"])

这使用数据帧的命名列中的数据,这正是您想要的。


编辑

上面的修复是可行的,但对于抓取代码末尾的一个简单问题:

df.to_csv('shotmaps.csv', encoding='utf-8')
df = df.T

您正在将 df 保存为 CSV,然后转置它。切换这两行,在绘图文件中使用我上面的代码,你应该已经准备好了。我没有安装 mplsoccer,所以我只是注释掉了这些行。

  • df 应类似于以下示例,使用 id 14620 创建
# display(df.head())

  Minute            Player Situation       Team                   xG       Result        x-coordinate         y-coordinate
0     13   Roberto Firmino  OpenPlay  Liverpool  0.03234297037124634  BlockedShot   0.774000015258789                 0.43
1     13  Andrew Robertson  OpenPlay  Liverpool  0.03856334835290909  MissedShots  0.8830000305175781   0.6880000305175781
2     16   Roberto Firmino  OpenPlay  Liverpool  0.07978218793869019  MissedShots               0.835    0.509000015258789
3     20   Xherdan Shaqiri  OpenPlay  Liverpool  0.04507734999060631  BlockedShot  0.7919999694824219  0.48900001525878906
4     21   Roberto Firmino  OpenPlay  Liverpool  0.09094344824552536  BlockedShot  0.9009999847412109    0.639000015258789

关于python - Pyplot 分散名称未定义,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69154609/

相关文章:

python - 从 MySQL 数据库获取后如何将 python 字符串转换为其原始类型

python - 将 plot() 添加到 imshow() 时避免更改图形大小

python - numexpr.evaluate ("a+b",out=a)

python - super() 和父类名有什么区别?

python - 使用 Anaconda 安装 GDAL

python - 如何使用matplotlib获取包含多个图形的多个窗口?

python - 如何避免 matplotlib 中多条形图中条形之间的重叠

python - 如何从 Pandas 数据框中访问日期?

python - Pandas 切割无限上/下界

python - 将时区设置为 Pandas 数据框