python - 根据其他行的值创建行

我有一个包含气象数据的数据框，每一行都是某个位置一天的数据。我想计算 3 天的平均值并将它们添加为列。自然的(至少对我来说)方法是使用 df.apply;但它速度慢且消耗大量内存(目前使用大约 3Gb 内存，并且还在不断增加)。我的函数如下所示:(合并的是数据帧，并按行号索引)

def three_day_stats(row):
    total_snowfall = 0
    total_sunshine = 0
    mean_wind = 0
    mean_temp = 0
    days = range(max(0, row.name-3), row.name+1)
    for i in days:
        day = merged.loc[i]
        total_snowfall += day['Snowfall']
        total_sunshine += day['Sunshine duration']
        mean_wind += (1/len(days))*(day['10 metre U wind component']**2 + day['10 metre V wind component']**2)**0.5
        mean_temp += (1/len(days))*day['2 metre temperature']
    return pd.Series({'3 day snowfall': total_snowfall, 
                      '3 day sunshine': total_sunshine,
                      '3 day wind': mean_wind, 
                      '3 day temperature': mean_temp})

有没有办法在不使用 apply 的情况下做到这一点？或者至少让它更加有效？

编辑:一行数据

10 metre U wind component                2.13432
10 metre V wind component              -0.932907
2 metre temperature                      3.88357
Date                         1996-11-01 00:00:00
Latitude                                 46.3975
Longitude                                 7.8515
Snow density                             269.103
Snow depth                           0.000514924
Snowfall                                       0
Sunshine duration                        2.87365
Temperature of snow layer              -0.677888
winter                                   2015/16
canton                                        VS
community                           Baltschieder
elevation                                   3440
aspect_string                                  E
Avalanche                                      0
Name: 0, dtype: object

最佳答案

您可以使用rolling使用聚合总和和平均值，首先创建3天风列:

np.random.seed(100)
start = pd.to_datetime('2015-02-24')
rng = pd.date_range(start, periods=10)
cols = ['Snowfall','Sunshine duration','10 metre U wind component','10 metre V wind component','2 metre temperature']
merged = pd.DataFrame(np.random.randint(10,size=(10,5)), columns=cols, index=rng).reset_index()

print (merged)
       index  Snowfall  Sunshine duration  10 metre U wind component  \
0 2015-02-24         8                  8                          3   
1 2015-02-25         0                  4                          2   
2 2015-02-26         2                  2                          1   
3 2015-02-27         4                  0                          9   
4 2015-02-28         4                  1                          5   
5 2015-03-01         4                  3                          7   
6 2015-03-02         7                  7                          0   
7 2015-03-03         9                  3                          2   
8 2015-03-04         1                  0                          7   
9 2015-03-05         0                  8                          2   

   10 metre V wind component  2 metre temperature  
0                          7                    7  
1                          5                    2  
2                          0                    8  
3                          6                    2  
4                          3                    4  
5                          1                    1  
6                          2                    9  
7                          5                    8  
8                          6                    2  
9                          5                    1

merged['3 day wind'] = (merged['10 metre U wind component']** 2 + 
                        merged['10 metre V wind component']** 2)**0.5
df = merged.rolling(4, min_periods=1).agg({'Snowfall': 'sum', 
                            'Sunshine duration':'sum', 
                            '2 metre temperature':'mean',
                            '3 day wind':'mean'})
d = {"Snowfall":"3 day snowfall",
     "Sunshine duration":"3 day sunshine",
    "2 metre temperature":"2 metre temperature"}
df = df.rename(columns = d)
print (df)
   3 day wind  3 day sunshine  3 day snowfall  2 metre temperature
0    7.615773             8.0             8.0             7.000000
1    6.500469            12.0             8.0             4.500000
2    4.666979            14.0            10.0             5.666667
3    6.204398            14.0            14.0             4.750000
4    5.758193             7.0            10.0             4.000000
5    6.179668             6.0            14.0             3.750000
6    6.429668            11.0            19.0             4.000000
7    5.071796            14.0            24.0             5.500000
8    5.918944            13.0            21.0             5.000000
9    5.497469            18.0            17.0             5.000000

关于python - 根据其他行的值创建行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41936756/

python - 根据其他行的值创建行

上一篇：python - 使用Scrapy抓取数据

下一篇：python - 从 `Tkinter.Entry` 获取值并将其内容与另一个值进行比较