python - 列标题前缀上的 GroupBy 列

标签 python pandas dataframe group-by pandas-groupby

我有一个数据框，其列名以一组前缀列表开头。我想获取按以相同前缀开头的列分组的数据框中值的总和。

df = pd.DataFrame([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]],
              columns=['abc', 'abd', 'wxy', 'wxz'])
prefixes = ['ab','wx']
df
    abc abd wxy wxz
0   1   2   3   4
1   1   2   3   4
2   1   2   3   4
3   1   2   3   4

我唯一能弄清楚如何做到这一点的方法是遍历前缀列表，从数据框中获取以该字符串开头的列，然后对结果求和。

results = []
for p in prefixes:
  results.append([p, df.loc[:, df.columns.str.startswith(p)].values.sum()])
results = pd.DataFrame(results,)
results.set_index(keys=[0], drop=True).T

    ab  wx
1   12  28

我希望有一种更优雅的方式来做到这一点，也许是使用 groupby()，但我想不出来。

最佳答案

首先，需要确定哪些列包含哪些前缀。然后我们使用它来执行 groupby。

grouper = [next(p for p in prefixes if p in c) for c in df.columns]
u = df.groupby(grouper, axis=1).sum()

   ab  wx
0   3   7
1   3   7
2   3   7
3   3   7

现在差不多了，

u.sum().to_frame().T

   ab  wx
0  12  28

另一种选择是使用 np.char.startswith 和 argmax 进行矢量化:

idx = np.char.startswith(
    df.columns.values[:, None].astype(str), prefixes).argmax(1)

(pd.Series(df.groupby(idx, axis=1).sum().sum().values, index=prefixes)
   .to_frame()
   .transpose())

   ab  wx
0  12  28

关于python - 列标题前缀上的 GroupBy 列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54207038/

上一篇：python - 如何将最长的子字节解码为str？

下一篇：pythonnet clr 在 jupyter notebook 中无法识别

相关文章：

python - 将系列字典从数据帧列转换为同一数据帧中的单独列

Python 类型提示联合

javascript - d3.js 使用 python websocket 数据实时更新 svg 行

python - 文本数据存储方式不同

python - Pandas : Assign result of groupby to dataframe to a new column

python-3.x - 将 panda 数据框列从字典数据形式解析为每个字典键的新列

Python Pandas 重新排序表

python - 根据 Pandas 中的另一列提取列值

python - Django 应用程序 SSL 套接字连接到固件

python - 如何更改 Spark 设置以允许 spark.dynamicAllocation.enabled？