python-3.x - 对多个列进行计数并在单独的列中列出计数并保留一列

标签 python-3.x pandas dataframe count range

我有以下数据框:

    id  coord_id    val1    val2    record  val3
0   snp chr15_1-1000    1.0 0.9 xx12    2
1   snv chr15_1-1000    1.0 0.7 yy12    -4
2   ins chr15_1-1000    0.01    0.7 jj12    -4
3   ins chr15_1-1000    1.0 1.5 zzy1    -5
4   ins chr15_1-1000    1.0 1.5 zzy1    -5
5   del chr10_2000-4000 0.1 1.2 j112    12
6   del chr10_2000-4000 0.4 1.1 jh12    15

我正在尝试计算每个 id 出现的每个 coord_id 的次数，但将 val1 列保留在结果表中，但仅包含该列中的值的范围，例如，我正在尝试完成以下结果:

  id            snp    snv         ins    del   total val1  
chr15_1-1000    1       1           3      0     5     0.01-1.0
chr10_2000-4000 0       0           0      2     2    0.1-0.4

我想按列总计升序排序。

非常感谢提前。

最佳答案

首先转入包含计数聚合和边际总和的 id 列。然后使用 val1 min-max 字符串进行 join():

(df.pivot_table(index='coord_id', columns='id', values='val1',
                aggfunc='count', fill_value=0,
                margins=True, margins_name='total')
   .join(df.groupby('coord_id').val1.agg(lambda x: f'{x.min()}-{x.max()}'))
   .sort_values('total', ascending=False)
   .drop('total'))

#                  del  ins  snp  snv  total      val1
# coord_id                                            
# chr15_1-1000       0    3    1    1      5  0.01-1.0
# chr10_2000-4000    2    0    0    0      2   0.1-0.4

关于python-3.x - 对多个列进行计数并在单独的列中列出计数并保留一列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67454148/

上一篇：email - 电子邮件内的链接显示为双纯文本

下一篇：scala - 拥有包含 10K 记录的用户静态列表/序列，这是线程安全的吗？

相关文章：

python - 类型提示为多种类型的逻辑与

python - 尝试从 Pandas Dataframe 中删除多行，但删除的行数超出预期

python - 过滤 Pandas DataFrame时如何使用lower()方法？

python - 如何以更快的方式从数据框中删除一系列行

r - 用 NA 确定行数

python - 如何使用 Python 将目录从包递归复制到当前路径

python - RegEx - 获取两个分隔符之间的多行内容

python-3.x - 集群上的 SLURM 和 Python 多处理池

python - 用日期做这个 pandas 公式的更快(矢量化)方法

python - pandas:与不同列中的键合并(使用 col1 或 col2 合并)