python - 返回按 ID 分组的两个数字的范围

我有一个像这样的数据框

d = {  'id': pd.Series(['1','2', '3', '4', '5','6']),
       'count' : pd.Series([11, 0, 2, 0, 1,3])}

df = pd.DataFrame(d)

有没有办法让ID从0开始计数到计数列中指定的数字，从0开始？例如:

id  count  count_2
------------------
1     11        0
                1
                2
                3
              ...
               11
2      0        0
3      3        0
                1
                2
                3
...

最佳答案

使用DataFrame.explode通过范围创建新列:

df['count_2'] = df['count'].apply(lambda x: range(x+1))
df = df.explode('count_2').reset_index(drop=True)

另一个想法 Index.repeat和 GroupBy.cumcount ，比你@adir abargil 的想法:

df = df.loc[df.index.repeat(df['count'].add(1))]
df['count_2'] = df.groupby(level=0).cumcount()
df = df.reset_index(drop=True)

print (df)    
   id  count count_2
0   1     11       0
1   1     11       1
2   1     11       2
3   1     11       3
4   1     11       4
5   1     11       5
6   1     11       6
7   1     11       7
8   1     11       8
9   1     11       9
10  1     11      10
11  1     11      11
12  2      0       0
13  3      2       0
14  3      2       1
15  3      2       2
16  4      0       0
17  5      1       0
18  5      1       1
19  6      3       0
20  6      3       1
21  6      3       2
22  6      3       3

最后如果需要将重复值设置为空字符串:

df.loc[df.duplicated(['id','count']), ['id','count']] = ''
print (df)
   id count count_2
0   1    11       0
1                 1
2                 2
3                 3
4                 4
5                 5
6                 6
7                 7
8                 8
9                 9
10               10
11               11
12  2     0       0
13  3     2       0
14                1
15                2
16  4     0       0
17  5     1       0
18                1
19  6     3       0
20                1
21                2
22                3

性能测试:

#23k rows
df = pd.concat([df] * 1000, ignore_index=True)


def f(df):
    df = df.loc[df.index.repeat(df['count'].add(1))]
    df['count_2'] = df.groupby(level=0).cumcount()
    return df.reset_index(drop=True)

In [55]: %%timeit
    ...: f(df)
    ...: 
5.57 ms ± 39.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [56]: %%timeit
    ...: df['count_2'] = df['count'].apply(lambda x: range(x+1))
    ...: df.explode('count_2').reset_index(drop=True)
    ...: 
    ...: 
20.2 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

关于python - 返回按 ID 分组的两个数字的范围，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65624256/

python - 返回按 ID 分组的两个数字的范围

上一篇：c# - 使用 .NET Core 3.1 上传大于 100 MB 的文件将导致 400(错误请求)

下一篇：python-3.x - 定时器触发不会触发队列但手动录入会触发-Python