python - 使用 Pandas 按日期时间间隔分组

标签 python datetime pandas

我有数据

data    id  url size    domain  subdomain
13/Jun/2016:06:27:26    30055   https://api.weather.com/v1/geocode/55.740002/37.610001/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks  3929    weather.com api.weather.com
13/Jun/2016:06:27:26    30055   https://api.weather.com/v1/geocode/54.720001/20.469999/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks  3845    weather.com api.weather.com
13/Jun/2016:06:27:27    3845    https://api.weather.com/v1/geocode/54.970001/73.370003/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks  30055   weather.com api.weather.com
13/Jun/2016:06:27:27    30055   https://api.weather.com/v1/geocode/59.919998/30.219999/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks  3914    weather.com api.weather.com
13/Jun/2016:06:27:28    30055   https://facebook.com    4005    facebook.com    facebook.com

我需要以 5 分钟的间隔对其进行分组。 欲望输出

 data   id  url size    domain  subdomain
13/Jun/2016:06:27:26    30055   https://api.weather.com/v1/geocode/55.740002/37.610001/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks  3929    weather.com api.weather.com
13/Jun/2016:06:27:27    3845    https://api.weather.com/v1/geocode/54.970001/73.370003/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks  30055   weather.com api.weather.com
13/Jun/2016:06:27:28    30055   https://facebook.com    4005    facebook.com    facebook.com

我需要根据 id, subdomain 分组并建立间隔 5min 我尝试使用

print df.groupby([df['data'],pd.TimeGrouper(freq='Min')])

首先用分钟分组,但返回 TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

最佳答案

您需要使用 pd.to_datetime() 和适当的 format 设置来解析 data 并将结果用作 index。然后 .groupby() 同时重采样到 5Min 间隔:

df.index = pd.to_datetime(df.data, format='%d/%b/%Y:%H:%M:%S')
df.groupby(pd.TimeGrouper('5Min')).apply(lambda x: x.groupby(['id', 'subdomain']).first())

                                                           data  \
data                id    subdomain                               
2016-06-13 06:25:00 3845  api.weather.com  13/Jun/2016:06:27:27   
                    30055 api.weather.com  13/Jun/2016:06:27:26   
                          facebook.com     13/Jun/2016:06:27:28   

                                                                                         url  \
data                id    subdomain                                                            
2016-06-13 06:25:00 3845  api.weather.com  https://api.weather.com/v1/geocode/54.970001/7...   
                    30055 api.weather.com  https://api.weather.com/v1/geocode/55.740002/3...   
                          facebook.com                                  https://facebook.com   

                                            size        domain  
data                id    subdomain                             
2016-06-13 06:25:00 3845  api.weather.com  30055   weather.com  
                    30055 api.weather.com   3929   weather.com  
                          facebook.com      4005  facebook.com 

关于python - 使用 Pandas 按日期时间间隔分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37845617/

相关文章:

python - 如何获取 Python 函数中命名参数的字典

python - 用于检查到期日期的代码,来自一个 python 脚本输出

Laravel 根据偏好为用户更改时区

python - 如何替换最后 2 个字符(如果在 python 中可用)

python - Pandas 数据帧连接后重置组数

python - Pandas 数字格式,带括号的负数

python - 使用 TCP 连接发送 PNG 文件

python - 在docker上运行uwsgi导致没有这样的文件或目录

PHP ISO 8601 格式转换问题

python - 如何旋转 pandas 数据框?