python - Pandas 计算数据框中不同的多列并按多列分组

<分区>

我找到了答案 test2 = test_pd.groupby(by = ['ID'])['国家','颜色'].nunique().reset_index()

idk 为什么当 rafael 提供的链接没有回答问题时这个问题被标记为重复

我有一个包含 3 列的数据框:

   country    color    ID 
0  Germany    Red      12     
1  France     Red      13
2  US         Blue     11
3  France     Red      11

如果我想在 SQL 中找出每个 ID 的不同国家和颜色的数量，它将是

select  ID
  , count(distinct(country)) as num_countries
  , count(distinct(color)) as num_color
from table_name
group by ID;

结果是这样的

   ID    num_countries   num_color
0  12         1              1   
0  11         2              2   
0  13         1              1

如何在 Pandas 中获得相同的结果？

最佳答案

使用DataFrame.groupby.nunique :

df_unique=df.groupby('ID')['country','color'].nunique().add_prefix('num_').reset_index()
print(df_unique)

   ID  num_country  num_color
0  11            2          2
1  12            1          1
2  13            1          1

关于python - Pandas 计算数据框中不同的多列并按多列分组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58648442/

上一篇：java - 为什么我的 swagger 不能在 java 中使用 springboot？

下一篇：r - 如何将函数应用于R数据框中的每一列

相关文章：

python - 使用计数比率的附加列对 DataFrame 进行分组和旋转

python - 如果使用 groupby 方法满足另一列中的条件，则使用多列进行条件过滤

Python创建一个包含 float 的可变大小列表

python - opencv:将像素写入图像

python - 构建期间在 Dockerfile 中激活和切换 Anaconda 环境

python - python 项目是否需要 MANIFEST.in，其中应该包含什么？

python - 数据库到数据框并获取有关填充列的信息

python - 使用范围数据集返回 2 秒的累积和

python - pandas:如何从使用 iterrows() 提取的行创建 DataFrame？

python - 如何计算 Python Pandas 中组的移位列