python - 使用 Pandas 移动分类值窗口

我有一个像这样的 Pandas 系列:

s = pd.Series(["A", "A", "B", "C", "A", "C", "A", "C", "A", "B", "B", "B", "A", "A", "C"])

我想在大小为 4 的非重叠窗口中获取每个字母的数量或比例。

我试过这个:

pd.rolling_apply(s, 4, pd.value_counts)

但它不起作用。

ValueError: could not convert string to float: C

执行此任务有什么想法吗？

最佳答案

由于您的系列使用 RangeIndex，您可以通过除以它们的大小来创建非重叠窗口:

print(s.index // 4)
# => Int64Index([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3], dtype='int64')

按以上分组并使用.value_counts():

s.groupby(s.index // 4).value_counts()

# 0  A    2
#    B    1
#    C    1
# 1  A    2
#    C    2
# 2  B    3
#    A    1
# 3  A    2
#    C    1
# dtype: int64

如果您更喜欢比例而不是计数，请给 .value_counts() 参数 normalize=True:

s.groupby(s.index // 4).value_counts(normalize=True)

# 0  A    0.500000
#    B    0.250000
#    C    0.250000
# 1  A    0.500000
#    C    0.500000
# 2  B    0.750000
#    A    0.250000
# 3  A    0.666667
#    C    0.333333
# dtype: float64

如果您的系列有不同类型的索引，您仍然可以按如下方式生成窗口:

pd.Series(range(len(s))) // 4

# 0     0
# 1     0
# 2     0
# 3     0
# 4     1
# 5     1
# 6     1
# 7     1
# 8     2
# 9     2
# 10    2
# 11    2
# 12    3
# 13    3
# 14    3
# dtype: int64

以下两行中的任何一行都会产生与上面相同的输出:

s.groupby(pd.Series(range(len(s))) // 4).value_counts()
s.groupby(pd.Series(range(len(s))) // 4).value_counts(normalize=True)

关于python - 使用 Pandas 移动分类值窗口，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24740127/

上一篇：python - 使用 python 获取实际的 facebook 和 twitter 图像 url

下一篇：python - 显微镜图像分割 : bacteria segmentation with python

python - 将逗号分隔字符串的 pandas DataFrame 列转换为单热编码

python - 获取 Pandas 数据框中重复行的所有ID

python - 每天运行一次 Python 脚本

python - 有趣的 "getElementById() takes exactly 1 argument (2 given)"，有时会发生。有人可以解释一下吗？

python - 从 Python 中的公钥派生 SSH 指纹

python - 以 0.5 为步长对 Dataframe 建立索引

python - Plotly:将自定义文本添加到 px.Treemap 视觉

python - 如何从 Pandas 系列中获取最大值和名称？

python - newrelic python 代理问题