Python For 循环 : Optimize speed of code when replacing cat code with original values

我想优化以下代码的速度:

 true_results_str = true_results.astype(str)
 for i in range( 0,len(true_results)): 
     for k in range( 1,true_results.shape[1]): 
         if pd.isna(true_results.iloc[i][k]) == False:
               true_results_str.iat[i,k]=df_data_.Type[df_.cat_code == mapping_item_id[true_results.iloc[i][k]]].item()

true_results 数据框的示例如下所示:

      user   rec_2  rec_3  
0       16     nan    nan          
1       18      0      4          
2       51      3      0        
3       52      3     nan        
4       58      3      0

字典mapping_item_id的示例如下所示:

Key    Type   Size   Value
0      int64    1      13     
3      int64    1      14     
4      int64    1      15      
6      int64    1      16

数据帧 df_ 的示例如下所示:

    _Type              cat_code
    Car                   13
    Shirt                 14
    Tops                  15
    Shoes                 16

数据框替换为 df_ 中的 cat_code:

      user   rec_2  rec_3  
0       16     nan     nan         
1       18     13     16          
2       51     14     13        
3       52     14     nan        
4       58     14     13

最终的 true_results_str 数据框应该是:

      user   rec_2     rec_3  
0       16     nan      nan         
1       18     Car     Shoes          
2       51     Shirt    Car        
3       52     Shirt    nan        
4       58     Shirt    Car   


for test_results of length of 44550
Time Started at :  12262.5898183
Time Stopped at :  12317.2825541
Time Completed at :  54.692735799999355

最佳答案

您可以使用 mapping_item_id 和 df_ 创建最终字典，然后选择所有列，而不是先按DataFrame.iloc以及 DataFrame.apply 中的 lambda 函数使用名为 s 的 Series 进行映射 Series.map :

d = df_.set_index('cat_code')['_Type'].to_dict()

#if no match return NaN
fin = {k: d.get(v, np.nan) for k, v in mapping_item_id.items()}
print (fin)
{0: 'Car', 3: 'Shirt', 4: 'Tops', 6: 'Shoes'}

true_results.iloc[:, 1:] = true_results.iloc[:, 1:].apply(lambda x: x.map(fin))
print (true_results)
   user  rec_2 rec_3
0    16    NaN   NaN
1    18    Car  Tops
2    51  Shirt   Car
3    52  Shirt   NaN
4    58  Shirt   Car

关于Python For 循环 : Optimize speed of code when replacing cat code with original values，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66356530/

Python For 循环 : Optimize speed of code when replacing cat code with original values

上一篇：r - 如何根据空白行从 df 分区为多个 .csv？

下一篇：postgresql - 生成 Postgres 转储并保存到另一台服务器