python - 从数据帧的分段(循环)进行计算

2 个数据框。 1短1长。我想将长的分成几 block ，使用相关系数将它们与短的进行比较。

split 很好。然而，当将它们放入计算时，它返回 Nan。

import pandas as pd

data_a = {'ID': ["a1","a2","a3","a4","a5","a6","a7","a8","a9","a10","a11","a12","a13","a14","a15"], 
'Unit_Weight': [178,153,193,195,214,157,205,212,219,166,217,186,170,207,204]}

df_a = pd.DataFrame(data_a)

data_b = {'ID': ["b1","b2","b3","b4","b5"], 
'Unit_Weight': [128,123,123,125,204]}

df_b = pd.DataFrame(data_b)

size = 5      # 5 rows in the long data-frame
list_of_df_a = [df_a.loc[i:i+size-1,:] for i in range(0, len(df_a),size)]

for each in list_of_df_a:
    corr_e = each['Unit_Weight'].corr(df_b['Unit_Weight'])

输出:

0.6797202605786716
nan
nan

出了什么问题，如何纠正？谢谢。

p.s.:这些是手动计算的结果:

0.6797202605786716
-0.5501914564062937
0.2653370297540246

   ID  Unit_Weight
0  a1          178
1  a2          153
2  a3          193
3  a4          195
4  a5          214
    ID  Unit_Weight
5   a6          157
6   a7          205
7   a8          212
8   a9          219
9  a10          166
     ID  Unit_Weight
10  a11          217
11  a12          186
12  a13          170
13  a14          207
14  a15          204

最佳答案

两个系列中必须有相同的索引，因此将DataFrame.reset_index与drop=True一起使用:

for each in list_of_df_a:
    corr_e = each['Unit_Weight'].reset_index(drop=True).corr(df_b['Unit_Weight'])
    print (corr_e)

0.6797202605786716
-0.5501914564062937
0.26533702975402457

关于python - 从数据帧的分段(循环)进行计算，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56106273/

上一篇：python - 如何在 dask 数据框中添加反射(reflect)日期值(来自列)是否是假期成员的列

下一篇：python - 时间序列 : How can I adjust pd. date_range(freq ='B' )，包括星期六。 (Python)

相关文章：

python matplotlib 增加条形标签字体大小

python - Django:使用对象作为字典键是否合理？

python - 使用 QTableView 类时如何检测鼠标光标离开视口(viewport)？

python - 过滤掉不是前一年的精确倍数的组中的数据框行

python - 从 Pandas 时间戳转换时区

Python:如何将数据框中的 3 列作为函数中的 3 个单独参数传递并遍历列值

algorithm - 无限循环 : Determining and breaking out of Infinite loop

python - 理解 python xgboost cv

python - Pandas:滚动窗口来计算频率 - 最快的方法

ios - 在 for in 循环中修改数据结构是否安全？