python - 使用 Pandas 对数据框进行列式映射和操作

标签 python pandas

我有两个数据帧,即 df1df2。我想对 df2 中的 New_Amount_Dollar 列执行操作。基本上在 df1 中,我有历史货币数据,我想执行从 df2CurrencyAmount_Dollar 的日期操作计算 df2New_Amount_Dollar 列的值。

例如,在 df2 中,我的第一种货币是 AUDDate = '01-01-2019',所以我想计算 New_Amount_Dollar 值使得

New_Amount_Dollar = Amount_Dollar/AUD 来自 df1 的值 即 New_Amount_Dollar = 19298/98 = 196.91

另一个示例,在 df2 中,我将第三种货币作为 COP 用于 Date = '03-01-2019,所以我想计算New_Amount_Dollar 值使得

New_Amount_Dollar = Amount_Dollar/COP 来自 df1 的值 即 New_Amount_Dollar = 5000/0.043 = 116279.06

import pandas as pd
data1 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', '04-01-2019','05-01-2019'],
        'AUD':[98, 98.5, 99, 99.5, 97],
        'BWP':[30,31,33,32,31],
        'CAD':[0.02,0.0192,0.0196,0.0196,0.0192],
        'BND':[0.99,0.952,0.970,0.980,0.970],
        'COP':[0.05,0.047,0.043,0.047,0.045]}
df1 = pd.DataFrame(data1)

data2 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', '04-01-2019','05-01-2019'],
        'Currency':['AUD','AUD','COP','CAD','BND'],
        'Amount_Dollar':[19298, 19210, 5000, 200, 2300],
        'New_Amount_Dollar':[0,0,0,0,0]
        }
df2 = pd.DataFrame(data2) 

df1

         Date   AUD  BWP     CAD    BND    COP
0  01-01-2019  98.0   30  0.0200  0.990  0.050
1  02-01-2019  98.5   31  0.0192  0.952  0.047
2  03-01-2019  99.0   33  0.0196  0.970  0.043
3  04-01-2019  99.5   32  0.0196  0.980  0.047
4  05-01-2019  97.0   31  0.0192  0.970  0.045
df2

         Date Currency  Amount_Dollar  New_Amount_Dollar
0  01-01-2019      AUD          19298                  0
1  02-01-2019      AUD          19210                  0
2  03-01-2019      COP           5000                  0
3  04-01-2019      CAD            200                  0
4  05-01-2019      BND           2300                  0

预期结果

         Date Currency  Amount_Dollar  New_Amount_Dollar
0  01-01-2019      AUD          19298             196.91
1  02-01-2019      AUD          19210             195.02
2  03-01-2019      COP           5000          116279.06
3  04-01-2019      CAD            200           10204.08
4  05-01-2019      BND           2300            2371.13

最佳答案

使用DataFrame.lookupDataFrame.set_index对于数组并划分 Amount_Dollar 列:

arr = df1.set_index('Date').lookup(df2['Date'], df2['Currency'])

df2['New_Amount_Dollar'] = df2['Amount_Dollar'] /  arr
print (df2)
         Date Currency  Amount_Dollar  New_Amount_Dollar
0  01-01-2019      AUD          19298         196.918367
1  02-01-2019      AUD          19210         195.025381
2  03-01-2019      COP           5000      116279.069767
3  04-01-2019      CAD            200       10204.081633
4  05-01-2019      BND           2300        2371.134021

但如果日期时间不匹配,则使用 DataFrame.asfreq :

import pandas as pd
data1 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', 
                 '04-01-2019','05-01-2019','08-01-2019'],
        'AUD':[98, 98.5, 99, 99.5, 97,100],
        'BWP':[30,31,33,32,31,20],
        'CAD':[0.02,0.0192,0.0196,0.0196,0.0192,0.2],
        'BND':[0.99,0.952,0.970,0.980,0.970,.23],
        'COP':[0.05,0.047,0.043,0.047,0.045,0.023]}
df1 = pd.DataFrame(data1)

data2 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', '04-01-2019','07-01-2019'],
        'Currency':['AUD','AUD','COP','CAD','BND'],
        'Amount_Dollar':[19298, 19210, 5000, 200, 2300],
        'New_Amount_Dollar':[0,0,0,0,0]
        }
df2 = pd.DataFrame(data2) 
print (df1)
         Date    AUD  BWP     CAD    BND    COP
0  01-01-2019   98.0   30  0.0200  0.990  0.050
1  02-01-2019   98.5   31  0.0192  0.952  0.047
2  03-01-2019   99.0   33  0.0196  0.970  0.043
3  04-01-2019   99.5   32  0.0196  0.980  0.047
4  05-01-2019   97.0   31  0.0192  0.970  0.045
5  08-01-2019  100.0   20  0.2000  0.230  0.023

print (df2)
         Date Currency  Amount_Dollar  New_Amount_Dollar
0  01-01-2019      AUD          19298                  0
1  02-01-2019      AUD          19210                  0
2  03-01-2019      COP           5000                  0
3  04-01-2019      CAD            200                  0
4  07-01-2019      BND           2300                  0

df1['Date'] = pd.to_datetime(df1['Date'], dayfirst=True)
df2['Date'] = pd.to_datetime(df2['Date'], dayfirst=True)

print (df1.set_index('Date').asfreq('D', method='ffill'))
              AUD  BWP     CAD    BND    COP
Date                                        
2019-01-01   98.0   30  0.0200  0.990  0.050
2019-01-02   98.5   31  0.0192  0.952  0.047
2019-01-03   99.0   33  0.0196  0.970  0.043
2019-01-04   99.5   32  0.0196  0.980  0.047
2019-01-05   97.0   31  0.0192  0.970  0.045
2019-01-06   97.0   31  0.0192  0.970  0.045
2019-01-07   97.0   31  0.0192  0.970  0.045
2019-01-08  100.0   20  0.2000  0.230  0.023

arr = df1.set_index('Date').asfreq('D', method='ffill').lookup(df2['Date'], df2['Currency'])

df2['New_Amount_Dollar'] = df2['Amount_Dollar'] /  arr
print (df2)
        Date Currency  Amount_Dollar  New_Amount_Dollar
0 2019-01-01      AUD          19298         196.918367
1 2019-01-02      AUD          19210         195.025381
2 2019-01-03      COP           5000      116279.069767
3 2019-01-04      CAD            200       10204.081633
4 2019-01-07      BND           2300        2371.134021

关于python - 使用 Pandas 对数据框进行列式映射和操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58692313/

相关文章:

python - 单击带有 selenium 和 python 的 href 按钮?

javascript - 按 html 按钮时在后台运行终端命令

python - 仅绘制频率大于 2 的字符串

python - 将包含不同长度列表的字典转换为 2 列数据帧

python - 从 python 守护进程发布到 crossbar.io

python - 用于访问用户 OneDrive 的 Microsoft Graph API 代码库解决方案

java - pythonic OpenCV中java图像的使用

Python Pandas 索引位置的值

python - 设置 pd.to_datetime 的格式

python - Pandas groupby 多个聚合,不包括焦点组