我有一个如下所示的 pandas 数据框:
import pandas as pd
pd.DataFrame({"AAA":["x1","x1","x1","x2","x2","x2"],
"BBB":["y1","y1","y2","y2","y2","y1"],
"CCC":["t1","t2","t3","t1","t1","t1"],
"DDD":[10,11,18,17,21,30]})
Out[1]:
AAA BBB CCC DDD
0 x1 y1 t1 10
1 x1 y1 t2 11
2 x1 y2 t3 18
3 x2 y2 t1 17
4 x2 y2 t1 21
5 x2 y1 t1 30
问题
我想要的是对 AAA
列进行分组,这样我就有 2 个组 - x1
、x2
。
我想计算每个组的 BBB
列中 y1
与 y2
的比率。
并将此输出分配给新列BBB比率
所需的输出
所以我想把它作为我的输出。
pd.DataFrame({"AAA":["x1","x1","x1","x2","x2","x2"],
"BBB":["y1","y1","y2","y2","y2","y1"],
"CCC":["t1","t2","t3","t1","t1","t1"],
"DDD":[10,11,18,17,21,30],
"Ratio of BBB":[0.33,0.33,0.33,0.66,0.66,0.66]})
Out[2]:
AAA BBB CCC DDD Ratio of BBB
0 x1 y1 t1 10 0.33
1 x1 y1 t2 11 0.33
2 x1 y2 t3 18 0.33
3 x2 y2 t1 17 0.66
4 x2 y2 t1 21 0.66
5 x2 y1 t1 30 0.66
当前状态
我目前已经实现了这样的目标:
def f(df):
df["y1"] = sum(df["BBB"] == "y1")
df["y2"] = sum(df["BBB"] == "y2")
df["Ratio of BBB"] = df["y2"] / df["y1"]
return df
df.groupby(df.AAA).apply(f)
我想要实现的目标
是否可以使用 .pipe()
函数来实现此目的?
我在想这样的事情:
df = (df
.groupby(df.AAA) # groupby a column not included in the current series (df.colname)
.BBB
.value_counts()
.pipe(lambda series: series["BBB"] == "y2" / series["BBB"] == "y1")
)
编辑:使用pipe()
的一种解决方案
注意:用户 jpp下面发表了明确的评论:
unstack
/merge
/reset_index
operations are unnecessary and expensive
但是,我最初打算使用这个方法,我想我会在这里分享它!
df = (df
.groupby(df.AAA) # groupby the column
.BBB # select the column with values to calculate ('BBB' with y1 & y2)
.value_counts() # calculate the values (# of y1 per group, # of y2 per group)
.unstack() # turn the rows into columns (y1, y2)
.pipe(lambda df: df["y1"]/df["y2"]) # calculate the ratio of y1:y2 (outputs a Series)
.rename("ratio") # rename the series 'ratio' so it will be ratio column in output df
.reset_index() # turn the groupby series into a dataframe
.merge(df) # merge with the original dataframe filling in the columns with the key (AAA)
)
最佳答案
看起来您想要的是 y1
与总数的比率。使用groupby
+ value_counts
:
v = df.groupby('AAA').BBB.value_counts().unstack()
df['RATIO'] = df.AAA.map(v.y2 / (v.y2 + v.y1))
AAA BBB CCC DDD RATIO
0 x1 y1 t1 10 0.333333
1 x1 y1 t2 11 0.333333
2 x1 y2 t3 18 0.333333
3 x2 y2 t1 17 0.666667
4 x2 y2 t1 21 0.666667
5 x2 y1 t1 30 0.666667
要概括许多组,您可以使用
df['RATIO'] = df.AAA.map(v.y2 / v.sum(axis=1))
关于python - 每组的 pandas 计算两个类别的比率,并使用 .pipe() 作为新列附加到数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50892309/