Python:pandas,解析数学运算

标签 python csv pandas

stackoverflow 上的某人建议我使用 pandas 来标记 csv 文件的值,并提供了以下代码:

# original code
import pandas

cmf = pandas.read_csv('CMF_MA68II.csv', names=['wavelength', 'x', 'y', 'z'])
d65 = pandas.read_csv('D65_MA68II_10nm.csv', names=['wavelength', 'a', 'b'])
data = pandas.read_csv('spectral_data.csv', names=['serialNumber', 'wavelength', 'measurement', 'name'])

lookup = pandas.merge(cmf, d65, on='wavelength')
merged = pandas.merge(data, lookup, on='wavelength')

totals = ((lookup[['x', 'y', 'z']].T*lookup['a']).T).sum()
wps  = 100 * totals/totals['y']

print totals['y']
print "D65_CMF_2006_10_deg white point = "
print wps

我在最后添加了这部分:

# here's my crappy part:

i = 0

for i in range(i, i+1), data['serialNumber']:
    x = ((merged.x * merged.a * merged.measurement).sum() / (merged.y * merged.a * 100).sum())    
    y = ((merged.y * merged.a * merged.measurement).sum() / (merged.y * merged.a * 100).sum())    
    z = ((merged.z * merged.a * merged.measurement).sum() / (merged.y * merged.a * 100).sum())         
    print x, y, z

但是这些行对我文件的所有行执行操作,无论name如何与它们相关联,结果是所有单独测量的平均值。

如您所见,文件的结构'spectral_data.csv'names=['serialNumber', 'wavelength', 'measurement', 'name']

我想做的是执行此操作:

merged['X'] = (merged.x * merged.a * merged.measurement).sum()/totals['y']

name定义的一系列数据,即我的文件 'spectral_data.csv'包含多个系列的值,我想获取其中每个值的结果,并将它们存储在一个结构为 ['serial number', 'X', 'Y', 'Z', 'name ']

有人有解决办法吗?

谢谢

文件示例: “CMF_MA68II.csv”

400,1.879338E-02,2.589775E-03,8.508254E-02
410,8.277331E-02,1.041303E-02,3.832822E-01
420,2.077647E-01,2.576133E-02,9.933444E-01
430,3.281798E-01,4.698226E-02,1.624940E+00
440,4.026189E-01,7.468288E-02,2.075946E+00
450,3.932139E-01,1.039030E-01,2.128264E+00
460,3.013112E-01,1.414586E-01,1.768440E+00
470,1.914176E-01,1.999859E-01,1.310576E+00
480,7.593120E-02,2.682271E-01,7.516389E-01
490,1.400745E-02,3.554018E-01,3.978114E-01
500,5.652072E-03,4.780482E-01,2.078158E-01
510,3.778185E-02,6.248296E-01,8.852389E-02
520,1.201511E-01,7.788199E-01,3.784916E-02
530,2.380254E-01,8.829552E-01,1.539505E-02
540,3.841856E-01,9.665325E-01,6.083223E-03
550,5.374170E-01,9.907500E-01,2.323578E-03
560,7.123849E-01,9.944304E-01,8.779264E-04
570,8.933408E-01,9.640545E-01,3.342429E-04
580,1.034327E+00,8.775360E-01,1.298230E-04
590,1.147304E+00,7.869950E-01,5.207245E-05
600,1.148163E+00,6.629035E-01,2.175998E-05
610,1.048485E+00,5.282296E-01,9.530130E-06
620,8.629581E-01,3.950755E-01,0.000000E+00
630,6.413984E-01,2.751807E-01,0.000000E+00
640,4.323126E-01,1.776882E-01,0.000000E+00
650,2.714900E-01,1.083996E-01,0.000000E+00
660,1.538163E-01,6.033976E-02,0.000000E+00
670,8.281010E-02,3.211852E-02,0.000000E+00
680,4.221473E-02,1.628841E-02,0.000000E+00
690,2.025590E-02,7.797457E-03,0.000000E+00
700,9.816228E-03,3.776140E-03,0.000000E+00

“D6​​5_MA68II_10nm.csv”

400,82.7549,14.708
410,91.486,17.6753
420,93.4318,20.995
430,86.6823,24.6709
440,104.865,28.7027
450,117.008,33.0859
460,117.812,37.8121
470,114.861,42.8693
480,115.923,48.2423
490,108.811,53.9132
500,109.354,59.8611
510,107.802,66.0635
520,104.79,72.4959
530,107.689,79.1326
540,104.405,85.947
550,104.046,92.912
560,100,100
570,96.3342,107.184
580,95.788,114.436
590,88.6856,121.731
600,90.0062,129.043
610,89.5991,136.346
620,87.6987,143.618
630,83.2886,150.836
640,83.6992,157.979
650,80.0268,165.028
660,80.2146,171.963
670,82.2778,178.769
680,78.2842,185.429
690,69.7213,191.931
700,71.6091,198.261

“spectral_data.csv”

0,400,12.73,"a"
0,410,12.41,"a"
0,420,12.55,"a"
0,430,13.42,"a"
0,440,15.07,"a"
0,450,17.31,"a"
0,460,19.20,"a"
0,470,20.96,"a"
0,480,22.11,"a"
0,490,23.45,"a"
0,500,24.62,"a"
0,510,25.42,"a"
0,520,24.51,"a"
0,530,22.43,"a"
0,540,20.94,"a"
0,550,21.59,"a"
0,560,22.36,"a"
0,570,21.54,"a"
0,580,22.03,"a"
0,590,28.86,"a"
0,600,37.02,"a"
0,610,42.00,"a"
0,620,44.79,"a"
0,630,46.57,"a"
0,640,47.56,"a"
0,650,48.70,"a"
0,660,49.90,"a"
0,670,50.75,"a"
0,680,51.53,"a"
0,690,52.24,"a"
0,700,53.00,"a"
1,400,2.31,"b"
1,410,2.33,"b"
1,420,2.33,"b"
1,430,2.30,"b"
1,440,2.29,"b"
1,450,2.30,"b"
1,460,2.27,"b"
1,470,2.26,"b"
1,480,2.24,"b"
1,490,2.23,"b"
1,500,2.22,"b"
1,510,2.21,"b"
1,520,2.20,"b"
1,530,2.19,"b"
1,540,2.18,"b"
1,550,2.18,"b"
1,560,2.18,"b"
1,570,2.16,"b"
1,580,2.15,"b"
1,590,2.14,"b"
1,600,2.14,"b"
1,610,2.13,"b"
1,620,2.12,"b"
1,630,2.11,"b"
1,640,2.11,"b"
1,650,2.11,"b"
1,660,2.10,"b"
1,670,2.08,"b"
1,680,2.07,"b"
1,690,2.06,"b"
1,700,2.04,"b"

最佳答案

这会将计算分为三个新列,然后按名称和序列号进行分组(在这种情况下,您实际上可以按其中任何一个进行分组,但这样您就可以在最终结果中同时获得这两个列):

# First calculate the new columns
cols = ['x', 'y', 'z']
uppercols = ['X', 'Y', 'Z']
for uppercol, col in zip(uppercols, cols):
    merged[uppercol] = (merged[col] * merged.a * merged.measurement)/totals['y']

# Now group and sum
sums = merged.groupby(['serialNumber', 'name'])[uppercols].sum()

要将其写入 CSV 文件,只需执行

sums.to_csv('test.csv')

关于Python:pandas,解析数学运算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21435732/

相关文章:

Python帮助更新Postgres专栏

python - 寻找最繁忙时段的算法?

python - 将 pandas.dataframe.pivot_table 与非数字、非唯一值一起使用

python - 计算时间间隔内列值的平均值

python - 你如何让 djangorestframework 使用格式后缀返回 xml?

python - 将所有内容放在应用程序中心会弄乱 `DatePickerSingle` 和 `RadioItems`

python - 在 Python 2 中比较时间

r - 将 csv 文件导入 R 换行问题

c# - 从 CSV 数据中删除尾随逗号

python - 在 Python 中递增日期字符串 YYYY-MM-DD 的最快方法是什么?