python - 对 groupby pandas 数据框的算术运算

我有一个包含 40 列和 400000 行的 Pandas 数据框。我在 3 列上创建了一个汇总数据集。

现在，我需要根据其中两列计算 % 指标。 Python 抛出错误 -

unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'

这里是示例代码:

print sample_data
   date  part  receipt  bad_dollars  total_dollars  bad_percent
0     1   123       22           40            100          NaN
1     2   456       44           80            120          NaN
2     3   134       33           30            150          NaN
3     1   123       22           80            100          NaN
4     5   456       45           40             90          NaN
5     3   134       33           85            150          NaN
6     7   123       24           70            120          NaN
7     5   456       45           20             85          NaN
8     9   134       35           50            300          NaN
9     7   123       24          300            600          NaN

sample_data_group = sample_data.groupby(['date','part','receipt'])

sample_data_group['bad_percents']=sample_data_group['bad_dollars']/sample_data_group['total_dollars']

TypeError: unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'

请帮忙!

最佳答案

您可以在 groupby 对象上使用 apply 来做到这一点:

import pandas as pd
import numpy as np

cols = ['index', 'date',  'part',  'receipt',  'bad_dollars',  'total_dollars',
        'bad_percent']
sample_data = pd.DataFrame([
[0,     1,   123,       22,           40,            100,          np.nan],
[1,     2,   456,       44,           80,            120,          np.nan],
[2,     3,   134,       33,           30,            150,          np.nan],
[3,     1,   123,       22,           80,            100,          np.nan],
[4,     5,   456,       45,           40,             90,          np.nan],
[5,     3,   134,       33,           85,            150,          np.nan],
[6,     7,   123,       24,           70,            120,          np.nan],
[7,     5,   456,       45,           20,             85,          np.nan],
[8,     9,   134,       35,           50,            300,          np.nan],
[9,     7,   123,       24,          300,            600,          np.nan]],
                           columns = cols).set_index('index', drop = True)

sample_data_group = sample_data.groupby(['date','part','receipt'])

xx = sample_data_group.apply(
         lambda x: x.assign(bad_percent = x.bad_dollars/x.total_dollars))\
                      .reset_index(['date','part', 'receipt'], drop = True)

关于python - 对 groupby pandas 数据框的算术运算，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38342528/

python - 对 groupby pandas 数据框的算术运算

上一篇：python - 使用 Scrapy 的 LinkExtractor

下一篇：写入 UTF-8 字符时出现 Python 错误 "io.UnsupportedOperation: write"