python - 每个客户的样本量不同

我有一个这样的数据框

    Customer   Day
0.    A         1
1.    A         1
2.    A         1
3.    A         2
4.    B         3
5.    B         4

我想从中取样，但我想为每个客户取样不同的尺寸。我在另一个数据框中有每个客户的大小。例如，

    Customer   Day
0.    A         2
1.    B         1

假设我想每天为每个客户取样。到目前为止我有这个功能:

def sampling(frame,a): 
    return np.random.choice(frame.Id,size=a) 

grouped = frame.groupby(['Customer','Day'])
sampled = grouped.apply(sampling, a=??).reset_index()

如果我将 size 参数设置为全局常量，则运行没有问题。但当不同的值位于单独的数据帧上时，我不知道如何设置它。

最佳答案

您可以从具有样本大小的 df1 创建映射器，并使用该值作为样本大小，

mapper = df1.set_index('Customer')['Day'].to_dict()

df.groupby('Customer', as_index=False).apply(lambda x: x.sample(n = mapper[x.name]))


       Customer Day
0   3   A       2
    2   A       1
1   4   B       3

这会返回多索引，您可以随时reset_index，

df.groupby('客户').apply(lambda x: x.sample(n = mapper[x.name])).reset_index(drop = True)

    Customer    Day
0   A           1
1   A           1
2   B           3

关于python - 每个客户的样本量不同，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58794340/

上一篇：python - 如何修复操作系统错误: [WinError 6] The handle is invalid with Python and Selenium?

下一篇：python - Plist一直加载不结束，满足条件也不退出

相关文章：

python - Pandas 自合并阻塞了缺失的结构

python - Pandas 从循环创建数据框

python - Pandas 数据框中的数字与字符串数据的散点图

oop - 指定要在许多函数结束时执行的操作

python - python3 中的 pkgutil.walk_packages 需要 __init__.py 吗？

python - 异步 : How to handle multiple open files OS error

python - 连接数据框中的列并生成新 ID

Python Speedtest 面临认证问题_ssl.c :1056

python - OrderedDict 合并并追加 0(如果没有)

python - 如何要求用户在python 3中输入一个函数