stackoverflow 上的某人建议我使用 pandas 来标记 csv 文件的值,并提供了以下代码:
# original code
import pandas
cmf = pandas.read_csv('CMF_MA68II.csv', names=['wavelength', 'x', 'y', 'z'])
d65 = pandas.read_csv('D65_MA68II_10nm.csv', names=['wavelength', 'a', 'b'])
data = pandas.read_csv('spectral_data.csv', names=['serialNumber', 'wavelength', 'measurement', 'name'])
lookup = pandas.merge(cmf, d65, on='wavelength')
merged = pandas.merge(data, lookup, on='wavelength')
totals = ((lookup[['x', 'y', 'z']].T*lookup['a']).T).sum()
wps = 100 * totals/totals['y']
print totals['y']
print "D65_CMF_2006_10_deg white point = "
print wps
我在最后添加了这部分:
# here's my crappy part:
i = 0
for i in range(i, i+1), data['serialNumber']:
x = ((merged.x * merged.a * merged.measurement).sum() / (merged.y * merged.a * 100).sum())
y = ((merged.y * merged.a * merged.measurement).sum() / (merged.y * merged.a * 100).sum())
z = ((merged.z * merged.a * merged.measurement).sum() / (merged.y * merged.a * 100).sum())
print x, y, z
但是这些行对我文件的所有行执行操作,无论name
如何与它们相关联,结果是所有单独测量的平均值。
如您所见,文件的结构'spectral_data.csv'
是 names=['serialNumber', 'wavelength', 'measurement', 'name']
我想做的是执行此操作:
merged['X'] = (merged.x * merged.a * merged.measurement).sum()/totals['y']
由name
定义的一系列数据,即我的文件 'spectral_data.csv'
包含多个系列的值,我想获取其中每个值的结果,并将它们存储在一个结构为 ['serial number', 'X', 'Y', 'Z', 'name ']
有人有解决办法吗?
谢谢
文件示例: “CMF_MA68II.csv”
400,1.879338E-02,2.589775E-03,8.508254E-02
410,8.277331E-02,1.041303E-02,3.832822E-01
420,2.077647E-01,2.576133E-02,9.933444E-01
430,3.281798E-01,4.698226E-02,1.624940E+00
440,4.026189E-01,7.468288E-02,2.075946E+00
450,3.932139E-01,1.039030E-01,2.128264E+00
460,3.013112E-01,1.414586E-01,1.768440E+00
470,1.914176E-01,1.999859E-01,1.310576E+00
480,7.593120E-02,2.682271E-01,7.516389E-01
490,1.400745E-02,3.554018E-01,3.978114E-01
500,5.652072E-03,4.780482E-01,2.078158E-01
510,3.778185E-02,6.248296E-01,8.852389E-02
520,1.201511E-01,7.788199E-01,3.784916E-02
530,2.380254E-01,8.829552E-01,1.539505E-02
540,3.841856E-01,9.665325E-01,6.083223E-03
550,5.374170E-01,9.907500E-01,2.323578E-03
560,7.123849E-01,9.944304E-01,8.779264E-04
570,8.933408E-01,9.640545E-01,3.342429E-04
580,1.034327E+00,8.775360E-01,1.298230E-04
590,1.147304E+00,7.869950E-01,5.207245E-05
600,1.148163E+00,6.629035E-01,2.175998E-05
610,1.048485E+00,5.282296E-01,9.530130E-06
620,8.629581E-01,3.950755E-01,0.000000E+00
630,6.413984E-01,2.751807E-01,0.000000E+00
640,4.323126E-01,1.776882E-01,0.000000E+00
650,2.714900E-01,1.083996E-01,0.000000E+00
660,1.538163E-01,6.033976E-02,0.000000E+00
670,8.281010E-02,3.211852E-02,0.000000E+00
680,4.221473E-02,1.628841E-02,0.000000E+00
690,2.025590E-02,7.797457E-03,0.000000E+00
700,9.816228E-03,3.776140E-03,0.000000E+00
“D65_MA68II_10nm.csv”
400,82.7549,14.708
410,91.486,17.6753
420,93.4318,20.995
430,86.6823,24.6709
440,104.865,28.7027
450,117.008,33.0859
460,117.812,37.8121
470,114.861,42.8693
480,115.923,48.2423
490,108.811,53.9132
500,109.354,59.8611
510,107.802,66.0635
520,104.79,72.4959
530,107.689,79.1326
540,104.405,85.947
550,104.046,92.912
560,100,100
570,96.3342,107.184
580,95.788,114.436
590,88.6856,121.731
600,90.0062,129.043
610,89.5991,136.346
620,87.6987,143.618
630,83.2886,150.836
640,83.6992,157.979
650,80.0268,165.028
660,80.2146,171.963
670,82.2778,178.769
680,78.2842,185.429
690,69.7213,191.931
700,71.6091,198.261
“spectral_data.csv”
0,400,12.73,"a"
0,410,12.41,"a"
0,420,12.55,"a"
0,430,13.42,"a"
0,440,15.07,"a"
0,450,17.31,"a"
0,460,19.20,"a"
0,470,20.96,"a"
0,480,22.11,"a"
0,490,23.45,"a"
0,500,24.62,"a"
0,510,25.42,"a"
0,520,24.51,"a"
0,530,22.43,"a"
0,540,20.94,"a"
0,550,21.59,"a"
0,560,22.36,"a"
0,570,21.54,"a"
0,580,22.03,"a"
0,590,28.86,"a"
0,600,37.02,"a"
0,610,42.00,"a"
0,620,44.79,"a"
0,630,46.57,"a"
0,640,47.56,"a"
0,650,48.70,"a"
0,660,49.90,"a"
0,670,50.75,"a"
0,680,51.53,"a"
0,690,52.24,"a"
0,700,53.00,"a"
1,400,2.31,"b"
1,410,2.33,"b"
1,420,2.33,"b"
1,430,2.30,"b"
1,440,2.29,"b"
1,450,2.30,"b"
1,460,2.27,"b"
1,470,2.26,"b"
1,480,2.24,"b"
1,490,2.23,"b"
1,500,2.22,"b"
1,510,2.21,"b"
1,520,2.20,"b"
1,530,2.19,"b"
1,540,2.18,"b"
1,550,2.18,"b"
1,560,2.18,"b"
1,570,2.16,"b"
1,580,2.15,"b"
1,590,2.14,"b"
1,600,2.14,"b"
1,610,2.13,"b"
1,620,2.12,"b"
1,630,2.11,"b"
1,640,2.11,"b"
1,650,2.11,"b"
1,660,2.10,"b"
1,670,2.08,"b"
1,680,2.07,"b"
1,690,2.06,"b"
1,700,2.04,"b"
最佳答案
这会将计算分为三个新列,然后按名称和序列号进行分组(在这种情况下,您实际上可以按其中任何一个进行分组,但这样您就可以在最终结果中同时获得这两个列):
# First calculate the new columns
cols = ['x', 'y', 'z']
uppercols = ['X', 'Y', 'Z']
for uppercol, col in zip(uppercols, cols):
merged[uppercol] = (merged[col] * merged.a * merged.measurement)/totals['y']
# Now group and sum
sums = merged.groupby(['serialNumber', 'name'])[uppercols].sum()
要将其写入 CSV 文件,只需执行
sums.to_csv('test.csv')
关于Python:pandas,解析数学运算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21435732/