在 Pandas 中,我有一个这种类型的数据框:
value
SampleGroup sample
Group1 ref 18.1
smp1 NaN
smp2 20.3
smp3 30.0
smp4 23.8
smp5 23.2
我想要做的是添加一个新列,其中已从所有样本 (smp) 中减去引用 (ref)。 像这样:
value deltaValue
SampleGroup sample
Group1 ref 18.1 0
smp1 NaN NaN
smp2 20.3 2.2
smp3 30.0 11.9
smp4 23.8 5.7
smp5 23.2 5.1
有谁知道如何做到这一点? 谢谢!
最佳答案
好的,我找到了以下对我有用的内容:
In [327]:
t="""sample value
ref 18.1
smp1 NaN
smp2 20.3
smp3 30.0
smp4 23.8
smp5 23.2"""
df = pd.read_csv(io.StringIO(t), sep='\s+')
df
Out[327]:
sample value
0 ref 18.1
1 smp1 NaN
2 smp2 20.3
3 smp3 30.0
4 smp4 23.8
5 smp5 23.2
In [328]:
df['Group'] = 'Group1'
df
Out[328]:
sample value Group
0 ref 18.1 Group1
1 smp1 NaN Group1
2 smp2 20.3 Group1
3 smp3 30.0 Group1
4 smp4 23.8 Group1
5 smp5 23.2 Group1
In [329]:
df1 = df.set_index(['Group', 'sample'])
df1
Out[329]:
value
Group sample
Group1 ref 18.1
smp1 NaN
smp2 20.3
smp3 30.0
smp4 23.8
smp5 23.2
In [337]:
df1['deltaValue'] = df1['value'].sub(df1.loc[('Group1','ref')]['value'])
df1
Out[337]:
value deltaValue
Group sample
Group1 ref 18.1 0.0
smp1 NaN NaN
smp2 20.3 2.2
smp3 30.0 11.9
smp4 23.8 5.7
smp5 23.2 5.1
以下方法也有效:
df1['deltaValue'] = df1['value'] - df1.loc[('Group1','ref')]['value']
关于python - 计算与 pandas 中引用行的差异(python),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30238666/