python - 操作 .dat 文件并绘制累积数据

标签 python numpy matplotlib dataframe data-manipulation

我想从一个单调乏味的 .dat 文件中绘制一个数量,文件中的#time 列从 0 秒扩展到 70 秒,但我需要仔细查看数据(在本例中为核能)从 25 秒到 35 秒。

我想知道是否有一种方法可以操纵时间列和相应的其他列来仅记录和绘制所需时间跨度的数据。

我已经有一些代码可以为我完成 0-70 岁的工作:

import matplotlib
matplotlib.use('Agg')

import os
import numpy as np
import matplotlib.pyplot as plt
import string
import math



# reads from flash.dat
def getQuantity(folder, basename, varlist):

        # quantities[0] should contain only the quantities of varlist[0]        
        quantities =[]
        for i in range(len(varlist)):
                quantities.append([])

        with open(folder + "/" + basename + ".dat", 'r') as f: # same as f = open(...) but closes the file afterwards.

                for line in f:
                        if not ('#' or 'Inf')  in line: # the first line and restarting lines look like this.   

                                  for i in range(len(varlist)):
                                        if(varlist[i]==NUCLEAR_ENERGY and len(quantities[i])>0):
                                                quantities[i].append(float(line.split()[varlist[i]])+quantities[i][-1])
                                        else:
                                                quantities[i].append(float(line.split()[varlist[i]]))


        return quantities
# end def getQuantity

#create plot
plt.figure(1)

TIME = 0

NUCLEAR_ENERGY = 18

labels = ["time", "Nuclear Energy"]


flashFolder1 = '/home/trina/Pictures' # should be the flash NOT the flash/object folder.
lab1 = '176'


filename = 'flash' # 'flash' for flash.dat
nHorizontal = 1 # number of Plots in Horizontal Direction. Vertical Direction is set by program.
outputFilename = 'QuantityPlots_Nuclear.png'

variables = [NUCLEAR_ENERGY]


#Adjustments to set the size
nVertical = math.ceil(float(len(variables))/nHorizontal)   # = 6 for 16 = len(variables) & nHorizontal = 3.
F = plt.gcf()           #get figure
DPI = F.get_dpi()
DefaultSize = F.get_size_inches()
F.set_size_inches( DefaultSize[0]*nHorizontal, DefaultSize[1]*nVertical )       #build no of subplots in figure

variables.insert(0,TIME) # time as needed as well
data1 = getQuantity(flashFolder1, filename, variables)
time1 = np.array(data1[0])      #time is first column



for n in [n+1 for n in range(len(variables)-1)]: #starts at 1
        ax=plt.subplot(nVertical, nHorizontal, n)   #for example (6,3,0 to 15) inside loop for 16 variables
        if (min(data1[n])<0.0 or abs((min(data1[n]))/(max(data1[n])))>=1.e-2):
                plt.plot(time1, data1[n],label=lab1) #, label = labels[variables[n]])
                legend = ax.legend(loc='upper right', frameon=False)

        else:
                plt.semilogy(time1, data1[n],label=lab1) #, label = labels[variables[n]])
                legend = ax.legend(loc='upper right', frameon=False)

plt.savefig(outputFilename)

这是我可以用这段代码生成的图:

enter image description here

为了您的方便,我还分享了 .dat 文件:

https://www.dropbox.com/s/w4jbxmln9e83355/flash.dat?dl=0

非常感谢您的建议。

最佳答案

更新:累积核能图:

x = df.query('25 <= time <= 35').set_index('time')
x['cum_nucl_energy'] = x.Nuclear_Energy.cumsum()
x.cum_nucl_energy.plot(figsize=(12,10))

enter image description here

旧答案:

使用 Pandas 模块

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

matplotlib.style.use('ggplot')

fn = r'D:\temp\.data\flash.dat'
df = pd.read_csv(fn, sep='\s+', usecols=[0, 18], header=None, skiprows=[0], na_values=['Infinity'])
df.columns=['time', 'Nuclear_Energy']
df.query('25 <= time <= 35').set_index('time').plot(figsize=(12,10))
plt.show()
plt.savefig('d:/temp/out.png')

结果:

enter image description here

解释:

In [43]: pd.options.display.max_rows
Out[43]: 50

In [44]: pd.options.display.max_rows = 12

In [45]: df
Out[45]:
               time  Nuclear_Energy
0      0.000000e+00    0.000000e+00
1      1.000000e-07   -4.750169e+29
2      2.200000e-07   -5.699325e+29
3      3.640000e-07   -6.838392e+29
4      5.368000e-07   -8.206028e+29
5      7.441600e-07   -9.837617e+29
...             ...             ...
10210  6.046702e+01    7.160630e+44
10211  6.047419e+01    7.038907e+44
10212  6.048137e+01    6.934600e+44
10213  6.048856e+01    6.847015e+44
10214  6.049577e+01    6.765220e+44
10215  6.050298e+01    6.661930e+44

[10216 rows x 2 columns]

In [46]: df.query('25 <= time <= 35')
Out[46]:
           time  Nuclear_Energy
4534  25.001663    1.559398e+43
4535  25.006781    1.567793e+43
4536  25.011900    1.575844e+43
4537  25.017021    1.583984e+43
4538  25.022141    1.592015e+43
4539  25.027259    1.600200e+43
...         ...             ...
6521  34.966427    8.181516e+41
6522  34.972926    8.538806e+41
6523  34.979425    8.913695e+41
6524  34.985925    9.304403e+41
6525  34.992429    9.731310e+41
6526  34.998941    1.019862e+42

[1993 rows x 2 columns]

In [47]: df.query('25 <= time <= 35').set_index('time')
Out[47]:
           Nuclear_Energy
time
25.001663    1.559398e+43
25.006781    1.567793e+43
25.011900    1.575844e+43
25.017021    1.583984e+43
25.022141    1.592015e+43
25.027259    1.600200e+43
...                   ...
34.966427    8.181516e+41
34.972926    8.538806e+41
34.979425    8.913695e+41
34.985925    9.304403e+41
34.992429    9.731310e+41
34.998941    1.019862e+42

[1993 rows x 1 columns]

关于python - 操作 .dat 文件并绘制累积数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39561090/

相关文章:

python - Pandas:对于几列中的每个行值,将其转换为新行

python - Selenium:_wait_until_connectable 无限期暂停

python - 更改对象类型的重组列的数据类型

python - 显示两组标签的颜色条

python - 从 tweepy 导入流时出现错误

python - 使用 NumPy 时如何根据数据类型创建数据子集?

python - 乘以 2 列,直到获得所需的值

python - 为什么在使用 python 和 numpy 时 sin(180) 不为零?

python - 使用 Numpy 的最小二乘法进行线性回归后的奇怪图

python - Matplotlib:绘制一年中所有星期一每分钟的观察次数