python - 使用 map 创建新列时处理 PerformanceWarning

标签 python pandas

完整错误:

"PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider using pd.concat instead. To get a de-fragmented frame, use newframe = frame.copy() payouts[x] = ranking[x].map(prizes.set_index('Rank')['Payout'].to_dict())"

lineups = range(1, 5)
prizes = {'Rank':[1, 2, 3], 'Payout':[100, 50, 25]}
prizes = pd.DataFrame(prizes)
payouts = pd.DataFrame(lineups, columns=['Lineup'])

ranking = {'Lineup':[1, 2, 3, 4], 1:[1, 2 , 3, 4], 2:[2, 1, 4, 3], 3:[4, 1, 2, 3], 4:[1, 3, 4, 2]}
ranking = pd.DataFrame(ranking)

for x in range(1, 4):
     payouts[x] = ranking[x].map(prizes.set_index('Rank')['Payout'].to_dict())

payouts = payouts.fillna(-20)

最佳答案

我们可以创建一个 mapper 而不是循环,然后 apply map ranking 中的每一列,然后是 concat使用支出:

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = pd.concat(
    [payouts,
     ranking[range(1, 5)].apply(lambda s: s.map(mapper)).fillna(-20)],
    axis=1
)

或者我们可以 replacemask其中值超出最高奖金排名:

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = pd.concat(
    [payouts,
     ranking[range(1, 5)].replace(mapper)
         .mask(ranking.gt(prizes['Rank'].max()), -20)],
    axis=1
)

两者都产生支出:

   Lineup    1    2    3    4
0       1  100   50  -20  100
1       2   50  100  100   25
2       3   25  -20   50  -20
3       4  -20   25   25   50

*请注意,在此示例排名中包含构建 DataFrame 的必要信息,无需初始化 payouts:

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = ranking.copy()  # Create copy of ranking
cols = list(range(1, 5))
payouts[cols] = payouts[cols].apply(lambda s: s.map(mapper)).fillna(-20)

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = ranking.copy()  # Create copy of ranking
cols = list(range(1, 5))
payouts[cols] = (
    payouts[cols].replace(mapper).mask(ranking.gt(prizes['Rank'].max()), -20)
)

DataFrame 和导入:

import pandas as pd

prizes = pd.DataFrame({'Rank': [1, 2, 3], 'Payout': [100, 50, 25]})
payouts = pd.DataFrame({'Lineup': range(1, 5)})
ranking = pd.DataFrame({
    'Lineup': [1, 2, 3, 4],
    1: [1, 2, 3, 4],
    2: [2, 1, 4, 3],
    3: [4, 1, 2, 3],
    4: [1, 3, 4, 2]
})

关于python - 使用 map 创建新列时处理 PerformanceWarning,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68886708/

相关文章:

python - 在Python中按元素比较两个列表

python - pyftpdlib make os.makedirs 权限被拒绝

python - 混合/组合两个半完整的 Pandas 数据框

Python - 将日期时间列转换为秒

python - 保持最大值,直到 Pandas 中的 ID 和条件发生变化

python - 将列表列表拆入 pandas 数据框

python Pandas : Does 'loc' and 'iloc' stand for anything?

python - 用于否定三个 @ 后跟数字和三个 @ 的正则表达式

python - Flask 应用程序中 CSS 文件的 404 错误

python - 有没有更好的方法用 pandas 的方法 'ffill' 进行分段 fillna ?