我有两个数据框 火车家庭销售
family store_nbr date unit_sales
0 GROCERY I 1.0 2016-08-01 3.0
1 GROCERY I 1.0 2016-08-02 10.0
2 GROCERY I 1.0 2016-08-04 3.0
3 AUTOMOTIVE 1.0 2016-08-05 5.0
4 AUTOMOTIVE 1.0 2016-08-06 5.0
和train_sales
date store_nbr item_nbr unit_sales family
0 2016-08-01 1.0 103520 3.0 GROCERY I
1 2016-08-02 1.0 103520 1.0 GROCERY I
2 2016-08-04 1.0 103520 6.0 GROCERY I
3 2016-08-05 1.0 103520 2.0 AUTOMOTIVE
4 2016-08-06 1.0 103520 2.0 AUTOMOTIVE
我想将它们合并到以下位置
date store_nbr item_nbr unit_sales family f_unit_sales
0 2016-08-01 1.0 103520 3.0 GROCERY I 3.0
1 2016-08-02 1.0 103520 1.0 GROCERY I 10.0
2 2016-08-04 1.0 103520 3.0 GROCERY I 3.0
3 2016-08-05 1.0 103520 2.0 AUTOMOTIVE 5.0
4 2016-08-06 1.0 103520 2.0 AUTOMOTIVE 6.0
我正在尝试执行以下操作:
both_sales = train_sales_with_family.join(train_family_sales,how='left', on=['store_nbr','family','date'], rsuffix='f_')
但是我收到一个错误。 ValueError: len(left_on) 必须等于“right”索引中的级别数
关于如何进行此合并有什么建议吗?
最佳答案
我认为你需要merge
:
both_sales = train_sales.merge(train_family_sales,
how='left',
on=['store_nbr','family','date'],
suffixes=('','_'))
或添加set_index
for join
- 需要与 on
参数中的列相同级别的 MultiIndex
:
both_sales = train_sales.join(train_family_sales.set_index(['store_nbr','family','date']),
on=['store_nbr','family','date'],
rsuffix='_')
<小时/>
print (both_sales)
date store_nbr item_nbr unit_sales family unit_sales_
0 2016-08-01 1.0 103520 3.0 GROCERY I 3.0
1 2016-08-02 1.0 103520 1.0 GROCERY I 10.0
2 2016-08-04 1.0 103520 6.0 GROCERY I 3.0
3 2016-08-05 1.0 103520 2.0 AUTOMOTIVE 5.0
4 2016-08-06 1.0 103520 2.0 AUTOMOTIVE 5.0
关于python - 使用groupby但不创建系列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47830873/