python - Pandas - 只能将大小为 1 的数组转换为 Python 标量

我有两个数据框:

df_melt:

    MatchID GameWeek        Date                      Team  Home               AgainstTeam
0     46605        1  2019-08-09                 Liverpool  Home              Norwich City
1     46605        1  2019-08-09              Norwich City  Away                 Liverpool
2     46606        1  2019-08-10           AFC Bournemouth  Home          Sheffield United
3     46606        1  2019-08-10          Sheffield United  Away           AFC Bournemouth
4     46607        1  2019-08-10                   Burnley  Home               Southampton
..      ...      ...         ...                       ...   ...                       ...
533   46871       27  2020-02-23                   Watford  Away         Manchester United
534   46872       27  2020-02-22          Sheffield United  Home  Brighton and Hove Albion
535   46872       27  2020-02-22  Brighton and Hove Albion  Away          Sheffield United
536   46873       27  2020-02-22               Southampton  Home               Aston Villa
537   46873       27  2020-02-22               Aston Villa  Away               Southampton

并且，对于玩家匹配，df_pm:

                                       Player  GameWeek  Minutes  ... CloseShotCreated TotalShotCreated  HeadersCreated
PlayerMatchesDetailID                                             ...                                                  
1                                     Alisson         1       90  ...                0                0               0
2                             Virgil van Dijk         1       90  ...                0                0               0
3                                Joseph Gomez         1       90  ...                0                1               0
4                            Andrew Robertson         1       90  ...                0                1               0
5                      Trent Alexander-Arnold         1       90  ...                3                3               1
...                                       ...       ...      ...  ...              ...              ...             ...
15053                             Matty James        22        0  ...                0                0               0
15054                             Matty James        23        0  ...                0                0               0
15055                             Matty James        24        0  ...                0                0               0
15056                             Matty James        25        0  ...                0                0               0
15057                             Matty James        26        0  ...                0                0               0

现在，我尝试遍历 df_pm 并根据 df_melt 的某些条件查找项目，如下所示:

#Instantiate an empty list
match_ids = []
home_away = []
dates = []

#For each row in the player matches dataframe...
for row in df_pm.itertuples():
    #Look up the match id from the team matches dataframe
    team = row.ForTeam
    againstteam = row.AgainstTeam
    gameweek = row.GameWeek

    match_id = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'MatchID'].item()

    date = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'Date'].item()

    home = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'Home'].item()

    #Add it to the list
    match_ids.append(match_id)
    home_away.append(home)
    dates.append(date)

但是对于所有迭代，即使我打印“team”、againstteam”和“gameweek”，我也会收到以下错误:

Traceback (most recent call last):
  File "tableau_data_generation.py", line 155, in <module>
    'MatchID'].item()
  File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/base.py", line 652, in item
    return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar

...表明该项目不存在。

但是当我打印完整的数据帧 df_melt 时，就像这样:

with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
    print(df_melt, df_melt.shape)

我得到 (538, 6) 并且可以看到所有数据都在那里，没有任何缺陷。

当我检查类型时，我看到:

df_melt:

MatchID        object
GameWeek       object
Date           object
Team           object
Home           object
AgainstTeam    object

df_pm:

Player                 object
GameWeek                int64
Minutes                 int64
ForTeam                object
AgainstTeam            object
Goals                   int64
ShotsOnTarget           int64
ShotsInBox              int64
CloseShots              int64
TotalShots              int64
Headers                 int64
GoalAssists             int64
ShotOnTargetCreated     int64
ShotInBoxCreated        int64
CloseShotCreated        int64
TotalShotCreated        int64
HeadersCreated          int64

所以这里存在类型不匹配。

如果我在执行迭代之前添加以下代码行:

df_melt['GameWeek'] = pd.to_numeric(df_melt['GameWeek'])

我在 df_pm.itertuples() 的第一行成功地打印了几十个“match_id”、“date”和“home”(在我添加该行之前没有打印)，只是在第二行再次中断并出现相同的错误:

ValueError: can only convert an array of size 1 to a Python scalar

我该如何解决这个问题？

注意:这是上面代码之后的内容。

def matchid_lookup(player, date, team, gameweek):
    try:
        try:
            return df_pm.loc[(df_pm['Date']==date)
                        &(df_pm['Player']==player), 'MatchID'].item()
        except:
            return df_pm.loc[(df_pm['Date']==date)
                        &(df_pm['ForTeam']==team), 'MatchID'].iloc[0]
    except:
        return df_pm.loc[(df_pm['GameWeek']==gameweek)
                        &(df_pm['Player']==player), 'MatchID'].item()

#Declare the list as a column in the player matches df
df_pm['MatchID']=match_ids
df_pm['Date']=pd.to_datetime(dates)
df_pm['Home']=home_away
df_pm['Position']=df_pm['Player'].map(pos_lookup)

#Get the match IDs column first in the dataframe
cols = list(df_pm.columns)
new_cols = ['MatchID', 'Date', 'Home','Position'] + cols[:-4]
df_pm = df_pm[new_cols]

#Bring in stats from api table
#First, get key identifiers into the api table to facilitate joining
df_api_stats['Player'] = df_api_stats['PlayerID'].map(player_lookup)
df_api_stats['Team'] = df_api_stats['PlayerID'].map(team_lookup)    
df_api_stats['MatchID'] = df_api_stats.apply(lambda x: matchid_lookup(x['Player'],
                                                                      x['Date'],
                                                                      x['Team'],
                                                                      x['GameWeek']), axis=1)
api_cols = ['Player', 'MatchID', 'BPS', 'MinutesPlayed',
            'CleanSheet', 'Saves', 'NetTransfersIn',
            'SelectedBy', 'Points', 'Price']

df_api_cols = df_api_stats[api_cols]

最佳答案

因此 df_api_stats 中有一些 Date 不在 df_pm 中，您可以通过以下方式查看:

print (set(pd.to_datetime(df_api_stats['Date'])) - set(pd.to_datetime(df_pm['Date'])))
{Timestamp('2020-01-29 00:00:00'),
 Timestamp('2020-02-28 00:00:00'),
 Timestamp('2020-02-29 00:00:00'),
 Timestamp('2020-03-01 00:00:00'),
 Timestamp('2020-03-07 00:00:00'),
 Timestamp('2020-03-08 00:00:00'),
 Timestamp('2020-03-09 00:00:00')}

我不确定您想如何处理缺失值，但为了避免方法失败，您可以添加一个 except 并在所有可能性都不匹配时返回 nan。

def matchid_lookup(player, date, team, gameweek):
    try:
        try:
            return df_pm.loc[(df_pm['Date']==date)
                        &(df_pm['Player']==player), 'MatchID'].item()
        except:
            return df_pm.loc[(df_pm['Date']==date)
                        &(df_pm['ForTeam']==team), 'MatchID'].iloc[0]
    except:
        try:
            return df_pm.loc[(df_pm['GameWeek']==gameweek)
                            &(df_pm['Player']==player), 'MatchID'].item()
        except:
            return np.nan

注意:就在之前导致问题的 for 循环之前，不要忘记执行此操作:

df_melt['GameWeek'] = pd.to_numeric(df_melt['GameWeek'])
df_melt[['Team', 'AgainstTeam']] = df_melt[['Team', 'AgainstTeam']]\
                                          .replace('AFC Bournemouth', 'Bournemouth')

关于python - Pandas - 只能将大小为 1 的数组转换为 Python 标量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61705173/

python - Pandas - 只能将大小为 1 的数组转换为 Python 标量

上一篇：haskell - 我可以说服 stack/ghci 仅加载本地 .ghci 文件吗？

下一篇：c - K&R C 书关于 scanf 如何处理格式字符串中的空格和制表符的问题？

python - Pandas - 只能将大小为 1 的数组转换为 Python 标量

上一篇：haskell - 我可以说服 stack/ghci *仅*加载本地 .ghci 文件吗？

下一篇：c - K&R C 书关于 scanf 如何处理格式字符串中的空格和制表符的问题？

上一篇：haskell - 我可以说服 stack/ghci 仅加载本地 .ghci 文件吗？