python-3.x - 重复数组直到一定长度 groupby pandas

在 R 数据表中，有一种方法可以结合最终向量的长度来重复向量。例如，我创建了名为“period”的变量，它是向量 q 的重复。

R代码:

require(data.table)
q = c(1:3)
test = data.table(IDx = c(rep('42', 5) , rep('76', 3), rep('43', 3), rep('5', 2)),
                  IDy = c(rep('A', 5) , rep('A', 3), rep('B', 3) , rep('C',2)))
test[, period := rep(q, length.out = .N), by =c('IDx','IDy')]

   IDx IDy period
 1:  42   A      1
 2:  42   A      2
 3:  42   A      3
 4:  42   A      1
 5:  42   A      2
 6:  76   A      1
 7:  76   A      2
 8:  76   A      3
 9:  43   B      1
10:  43   B      2

我正在尝试在 python 中复制这个函数，但我有点卡住了。 cumcount 函数只能通过考虑序列 q 来应用，一旦到达最后一个索引，序列 q 就应该重新开始。

q = [1,2,3]
valuesX = ['42'] * 5 + ['76'] * 3 + ['43'] * 3 + ['5'] * 2 
valuesY = ['A'] * 5 + ['A'] * 3 + ['B'] * 3 + ['C'] * 2
test = pd.DataFrame({'IDx':valuesX,
                    'IDy':valuesY})

print(test.groupby(['IDx','IDy']).cumcount()+1)

尝试过的方法，

def repeat(seq, ind):
    length = len(ind.index)
    print(length)
    multiple, remainder = divmod(length, len(seq))
    test['t'] = seq * multiple + seq[:remainder]

print(test.groupby(['IDx']).apply(lambda x: repeat(q, x)))

最佳答案

合并groupby.cumcount与 mod ，然后使用 numpy 索引(您也可以 map 值):

import numpy as np

s = test.groupby(['IDx', 'IDy']).cumcount().mod(len(q))

test['period'] = np.array(q)[s]

输出:

   IDx IDy  period
0   42   A       1
1   42   A       2
2   42   A       3
3   42   A       1
4   42   A       2
5   76   A       1
6   76   A       2
7   76   A       3
8   43   B       1
9   43   B       2
10  43   B       3
11   5   C       1
12   5   C       2

关于python-3.x - 重复数组直到一定长度 groupby pandas，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/76624716/

上一篇：html - 为什么flex :1 on neighboring element in flexbox affect the fixed width of another element?

下一篇：google-sheets - 谷歌表格: How to count cells in range that contain a formula (not literal)

相关文章：

python - 如何在 Python 中解析带有行跨度的 HTML 表？

python - OpenCV 裁剪图像并显示裁剪后的原始图像

python - 长格式的 Pandas 数据框缺少月份

python - 如何解决 HDFStore 异常 : cannot find the correct atom type

r - 如何在 dplyr 分组数据上使用 rollmean

python - 如何更改列表中字典中的字符串键| python3.6

python - 如何从 print() 编写的字符串中获取 Python pandas DataFrame？

python - 合并之前清理数据的更好方法是什么？

mysql - 使用 `group by` 获得所需结果的正确方法是什么？

date - 使用日期时 MS Access Group By 中断