python - 如何使用其他数据框中的列值生成数据框

我正在处理以下数据框中的数据集。

#print(old_df)
   col1 col2 col3
0   1   10  1.5
1   1   11  2.5
2   1   12  5,6
3   2   10  7.8
4   2   24  2.1
5   3   10  3.2
6   4   10  22.1
7   4   11  1.3
8   4   89  0.5
9   4   91  3.3

我正在尝试生成另一个数据框，其中包含选定的 col1 值作为索引，选定的 col2 值作为列并分配相应的 col3 值。

例如:

selected_col1 = [1,2]
selected_col2 = [10,11,24]

新的数据框应该是这样的:

#print(selected_df)
     10     11     24
1    1.5    2.5    Nan
2    7.8    Nan    2.1

我试过下面的方法

selected_col1 = [1,2]
selected_col2 = [10,11,24]
selected_df =pd.DataFrame(index=selected_col1,columns=selected_col2) 
for col1_value in selected_col1:
    for col2_value in selected_col2:
        qry = 'col1 == {} & col2 == {}'.format(col1_value,col2_value)
        col3_value = old_df.query(qry).col3.values
        if(len(col3_value) > 0):
            selected_df.at[col1_value,col2_value] = col3_value[0]

但是因为我的数据框有大约 2000 万行，所以这种蛮力方法需要很长时间。还有比这更好的方法吗？

最佳答案

首先按成员资格过滤行 Series.isin在由 & 链接的两列中按位 AND 然后使用 DataFrame.pivot :

df = df[df['col1'].isin(selected_col1) & df['col2'].isin(selected_col2)]

df = df.pivot('col1','col2','col3')
print (df)
col2   10   11   24
col1               
1     1.5  2.5  NaN
2     7.8  NaN  2.1

如果可能，过滤后 col1 和 col2 中的一些重复对使用 DataFrame.pivot_table :

df = df.pivot_table(index='col1',columns='col2',values='col3', aggfunc='mean')

编辑:

如果使用 | 按位 OR 得到不同的输出:

df = df[df['col1'].isin(selected_col1) | df['col2'].isin(selected_col2)]

df = df.pivot('col1','col2','col3')
print (df)
col2    10   11   12   24
col1                     
1      1.5  2.5  5,6  NaN
2      7.8  NaN  NaN  2.1
3      3.2  NaN  NaN  NaN
4     22.1  1.3  NaN  NaN

关于python - 如何使用其他数据框中的列值生成数据框，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56376944/

python - 如何使用其他数据框中的列值生成数据框

上一篇：python - 在 Tensorflow 2 中的每个纪元之后计算每个类的召回率

下一篇：python - 在 python 中替换重复 np.vstack 的有效方法？