python - 根据 pandas 中另一列的值在 groupby 之后应用 lambda 函数

我制作了一个数据框来说明我的问题。假设我有三个病人:“a”、“b”、“c”。我们在三个不同时间点(t1、t2、t3)获得了这些患者的结果。我需要的是创建另一列“折叠”，这是 t1 的折叠变化。由于患者“c”在 t1 时没有任何结果，因此其相对于 t1 的倍数变化应为 nan。下面是代码:

df = pd.DataFrame ({ \
                   'time': np.repeat(['t1','t2','t3'], [2,3,3]),
                   'id': ['a', 'b', 'a', 'b', 'c', 'a', 'b', 'c'],
                   'result':np.random.randint(10,20,size=8) })
# create indicator column has_t1: if a patient  has t1: 1 if not: 0
df['is_t1'] = np.where(df['time']=='t1', 1, 0)
df['has_t1'] = df.groupby('id')['is_t1'].transform(sum)
# create fold change column
df['fold'] =df.sort_values(['id', 'time']).groupby('id').apply(lambda x: x['result']/x['result'].iloc[0] if x['has_t1'].iloc[0]==1 else np.nan)

我收到错误:

AttributeError: 'float' object has no attribute 'index'

我想要的输出是这样的:

        Fold
id time          
a  t1    1.000000
   t2    1.545455
   t3    1.000000
b  t1    1.000000
   t2    1.062500
   t3    0.937500
c  
   t2         NaN
   t3         NaN

有人知道我做错了什么吗？感谢您提前提供的帮助。

最佳答案

这是一种不涉及指示符列的替代方法。首先，unstack，然后重新stack，而不删除 NaN:

df = df.set_index(['id', 'time']).unstack().stack(dropna=False) 
df

         result
id time        
a  t1      12.0
   t2      18.0
   t3      13.0
b  t1      13.0
   t2      11.0
   t3      13.0
c  t1       NaN
   t2      13.0
   t3      17.0

接下来，调用 groupby + transform + head 并将 df.result 除以以下输出:

df['result'] /= df.groupby(level=0).result.transform('head', 1)    
df

           result
id time          
a  t1    1.000000
   t2    1.545455
   t3    1.000000
b  t1    1.000000
   t2    1.062500
   t3    0.937500
c  t1         NaN
   t2         NaN
   t3         NaN

关于python - 根据 pandas 中另一列的值在 groupby 之后应用 lambda 函数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48878391/

python - 根据 pandas 中另一列的值在 groupby 之后应用 lambda 函数

上一篇：python - 如何减少Python3请求中连接超时的等待？

下一篇：python - Scipy python 中的最高距离