python - Pandas : how to get the unique number of values in cells when cells contain lists?

出于某种神秘的原因，我有一个看起来像的数据框

index             col_weird      col_normal
2012-01-01 14:30  ['A','B']      2
2012-01-01 14:32  ['A','C','D']  4
2012-01-01 14:36  ['C','D']      2
2012-01-01 14:39  ['E','B']      4
2012-01-01 14:40  ['G','H']      2

我想每 5 分钟重新采样一次数据帧，并且

获取 col_weird 中所有列表中元素的唯一数量，
获取col_normal的平均值

当然，使用 resample().col_weird.nunique() 对于第一个任务会失败，因为我想要唯一的元素数量:即 14:30 之间code> 和 14:35 我预计这个数字是 4，对应于 A、B、C、D。

在同一时期，col_normal 的平均值当然是 3。

知道如何得到它吗？

谢谢!

最佳答案

我认为您可以先将list扩展为Series:

df = df['col'].apply(pd.Series).stack().reset_index(drop=True, level=1)
print (df)
2012-01-01 14:30    A
2012-01-01 14:30    B
2012-01-01 14:32    A
2012-01-01 14:32    C
2012-01-01 14:32    D
2012-01-01 14:36    C
2012-01-01 14:36    D
2012-01-01 14:39    E
2012-01-01 14:39    B
2012-01-01 14:40    G
2012-01-01 14:40    H
dtype: object

然后使用重新采样:

df = df.resample('1H').nunique()
print (df)
2012-01-01 14:00:00    7
Freq: H, dtype: int64

关于python - Pandas : how to get the unique number of values in cells when cells contain lists?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38355931/

上一篇：python - 我的 python 脚本和 cron 作业有什么问题？

下一篇：python - 如何通过 Flask 应用程序在 Python 中使用 GDAL 打开远程文件

相关文章：

python - 如何创建具有重复值的列 pandas(不匹配的索引)

python - pyqt 中的代码编辑器示例

python - 将字节字符串拆分为行

python - 内置 all() 函数不在负数列表上返回 True

python - 设置基于值计数和分组依据的数据框列值

python - 基于 Group By 列执行计算，然后我必须将该值传递给数据框中的新列

python - 用于在 python 中搜索视频的 Youtube API 中的 Unicode

python - Pandas :SettingWithCopyWarning

pandas - 对 pandas 的多索引 DataFrame 进行切片

python-3.x - 匹配语句 "NaN"