python - 如何让我的 python 代码运行得更快

标签 python performance loops timing

我正在编写循环遍历多个 netcdf 文件(大~28G)的代码。 netcdf 文件在整个域中具有多个 4D 变量 [时间、东西、南北、高度]。目标是遍历这些文件并遍历域中所有这些变量的每个位置,并将某些变量提取到一个大数组中。当文件丢失或不完整时,我用 99.99 填充这些值。现在我只是通过循环超过 2 个每日 netcdf 文件来进行测试,但由于某种原因它会永远(~14 小时)。我不确定是否有优化此代码的方法。我不认为 python 应该花这么长时间来完成这项任务,但也许这是 python 或我的代码的问题。下面是我的代码,希望它是可读的,并且非常感谢任何关于如何使它更快的建议:

#Domain to loop over
k_space = np.arange(0,37)
j_space = np.arange(80,170)
i_space = np.arange(200,307)

predictors_wrf=[]
names_wrf=[]

counter = 0
cdate = start_date
while cdate <= end_date:
    if cdate.month not in month_keep:
        cdate+=inc
        continue
    yy = cdate.strftime('%Y')        
    mm = cdate.strftime('%m')
    dd = cdate.strftime('%d')
    filename = wrf_path+'\wrfoutRED_d01_'+yy+'-'+mm+'-'+dd+'_'+hour_str+'_00_00'
    for i in i_space:
        for j in j_space:
            for k in k_space:
                    if os.path.isfile(filename):
                        f = nc.Dataset(filename,'r')
                        times = f.variables['Times'][1:]
                        num_lines = times.shape[0]
                        if num_lines == 144:
                            u = f.variables['U'][1:,k,j,i]
                            v = f.variables['V'][1:,k,j,i]
                            wspd = np.sqrt(u**2.+v**2.)
                            w = f.variables['W'][1:,k,j,i]
                            p = f.variables['P'][1:,k,j,i]
                            t = f.variables['T'][1:,k,j,i]
                        if num_lines < 144:
                            print "partial files for WRF: "+ filename
                            u = np.ones((144,))*99.99
                            v = np.ones((144,))*99.99
                            wspd = np.ones((144,))*99.99
                            w = np.ones((144,))*99.99
                            p = np.ones((144,))*99.99
                            t = np.ones((144,))*99.99
                    else:
                        u = np.ones((144,))*99.99
                        v = np.ones((144,))*99.99
                        wspd = np.ones((144,))*99.99
                        w = np.ones((144,))*99.99
                        p = np.ones((144,))*99.99
                        t = np.ones((144,))*99.99
                        counter=counter+1
                    predictors_wrf.append(u)
                    predictors_wrf.append(v)
                    predictors_wrf.append(wspd)
                    predictors_wrf.append(w)
                    predictors_wrf.append(p)
                    predictors_wrf.append(t)
                    u_names = 'u_'+str(k)+'_'+str(j)+'_'+str(i)
                    v_names = 'v_'+str(k)+'_'+str(j)+'_'+str(i)
                    wspd_names = 'wspd_'+str(k)+'_'+str(j)+'_'+str(i)
                    w_names = 'w_'+str(k)+'_'+str(j)+'_'+str(i)
                    p_names = 'p_'+str(k)+'_'+str(j)+'_'+str(i)
                    t_names = 't_'+str(k)+'_'+str(j)+'_'+str(i)
                    names_wrf.append(u_names)
                    names_wrf.append(v_names)
                    names_wrf.append(wspd_names)
                    names_wrf.append(w_names)
                    names_wrf.append(p_names)
                    names_wrf.append(t_names)
    cdate+=inc

最佳答案

这是收紧您的 forloop 的第一关。由于每个文件只使用一次文件形状,因此可以将处理移到循环之外,这将减少中断处理中的数据加载量。我仍然不明白 counterinc 做了什么,因为它们似乎没有在循环中更新。您肯定想研究重复的字符串连接性能,或者将附加到 predictors_wrfnames_wrf 的性能作为起点

k_space = np.arange(0,37)
j_space = np.arange(80,170)
i_space = np.arange(200,307)

predictors_wrf=[]
names_wrf=[]

counter = 0
cdate = start_date
while cdate <= end_date:
    if cdate.month not in month_keep:
        cdate+=inc
        continue
    yy = cdate.strftime('%Y')        
    mm = cdate.strftime('%m')
    dd = cdate.strftime('%d')
    filename = wrf_path+'\wrfoutRED_d01_'+yy+'-'+mm+'-'+dd+'_'+hour_str+'_00_00'
    file_exists = os.path.isfile(filename)
    if file_exists:
        f = nc.Dataset(filename,'r')
        times = f.variables['Times'][1:]
        num_lines = times.shape[0]
    for i in i_space:
        for j in j_space:
            for k in k_space:
                    if file_exists:    
                        if num_lines == 144:
                            u = f.variables['U'][1:,k,j,i]
                            v = f.variables['V'][1:,k,j,i]
                            wspd = np.sqrt(u**2.+v**2.)
                            w = f.variables['W'][1:,k,j,i]
                            p = f.variables['P'][1:,k,j,i]
                            t = f.variables['T'][1:,k,j,i]
                        if num_lines < 144:
                            print "partial files for WRF: "+ filename
                            u = np.ones((144,))*99.99
                            v = np.ones((144,))*99.99
                            wspd = np.ones((144,))*99.99
                            w = np.ones((144,))*99.99
                            p = np.ones((144,))*99.99
                            t = np.ones((144,))*99.99
                    else:
                        u = np.ones((144,))*99.99
                        v = np.ones((144,))*99.99
                        wspd = np.ones((144,))*99.99
                        w = np.ones((144,))*99.99
                        p = np.ones((144,))*99.99
                        t = np.ones((144,))*99.99
                        counter=counter+1
                    predictors_wrf.append(u)
                    predictors_wrf.append(v)
                    predictors_wrf.append(wspd)
                    predictors_wrf.append(w)
                    predictors_wrf.append(p)
                    predictors_wrf.append(t)
                    u_names = 'u_'+str(k)+'_'+str(j)+'_'+str(i)
                    v_names = 'v_'+str(k)+'_'+str(j)+'_'+str(i)
                    wspd_names = 'wspd_'+str(k)+'_'+str(j)+'_'+str(i)
                    w_names = 'w_'+str(k)+'_'+str(j)+'_'+str(i)
                    p_names = 'p_'+str(k)+'_'+str(j)+'_'+str(i)
                    t_names = 't_'+str(k)+'_'+str(j)+'_'+str(i)
                    names_wrf.append(u_names)
                    names_wrf.append(v_names)
                    names_wrf.append(wspd_names)
                    names_wrf.append(w_names)
                    names_wrf.append(p_names)
                    names_wrf.append(t_names)
    cdate+=inc

关于python - 如何让我的 python 代码运行得更快,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42382666/

相关文章:

php - 从数据库中填充选择字段

Java删除数组中的冗余项

python - Numpy.where() 在其条件中包含一个数组

python - 在 python 中编写类时,首选 mixins 还是对象属性?

MySQL Performance Schema 语句 DIGEST

.net - IronPython 性能

multithreading - 多线程环境中的Malloc性能

python - 发送带有 python SSL 错误的电子邮件

python - 如何将 django 连接到 docker redis 容器?

C++在while循环中更改变量