我有一个带有时间戳和两列的日志文件。我现在想重新采样并“旋转”从日志文件创建的日期帧。
原始数据帧/日志文件示例:
timestamp colA colB
2015-01-01 00:10:01 a x
2014-01-01 00:10:01 b y
2015-01-01 00:10:03 a x
2015-01-01 00:10:03 a x
2015-01-01 00:10:03 a y
2015-01-01 00:10:04 b x
2014-01-01 00:10:04 b y
2014-01-01 00:10:04 b y
2014-01-01 00:10:04 a x
2014-01-01 00:10:05 a x
2014-01-01 00:10:05 a x
2014-01-01 00:10:07 a y
2014-01-01 00:10:08 a x
按秒重新采样的示例结果:
a b
timestamp x y x y
2015-01-01 00:10:01 1 0 0 1
2015-01-01 00:10:02 0 0 0 0
2015-01-01 00:10:03 2 1 0 0
2015-01-01 00:10:04 1 0 1 2
2014-01-01 00:10:05 2 0 0 0
2014-01-01 00:10:06 0 0 0 0
2014-01-01 00:10:07 0 1 0 0
2014-01-01 00:10:08 1 0 0 0
我该如何实现这一目标?先重新采样,然后再分组/透视?或者反过来呢?更具体地说,单元格应包含每个特定重采样时间间隔的 colA/colB 组合的计数。在示例中为秒,但也可以是分钟、小时等。
我不固定这种格式,我也可以考虑得到一个重新采样的结果和groupby timestamp/colA之类的
colB
timestamp colA x y
2015-01-01 00:10:01 a 1 0
b 0 1
2015-01-01 00:10:02 a 0 0
b 0 0
2015-01-01 00:10:03 a 2 1
b 0 0
2015-01-01 00:10:04 a 1 0
b 1 2
2014-01-01 00:10:05 a 2 0
b 0 0
2014-01-01 00:10:06 a 0 0
b 0 0
2014-01-01 00:10:07 a 0 1
b 0 0
2014-01-01 00:10:08 a 1 0
b 0 0
最终用途是绘制不同的计数值
谢谢。
最佳答案
您可以使用pd.crosstab
:
import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s{2,}', parse_dates=[0])
table = pd.crosstab(index=[df['timestamp']], columns=[df['colA'], df['colB']])
产量
colA a b
colB x y x y
timestamp
2014-01-01 00:10:01 0 0 0 1
2014-01-01 00:10:04 1 0 0 2
2014-01-01 00:10:05 2 0 0 0
2014-01-01 00:10:07 0 1 0 0
2014-01-01 00:10:08 1 0 0 0
2015-01-01 00:10:01 1 0 0 0
2015-01-01 00:10:03 2 1 0 0
2015-01-01 00:10:04 0 0 1 0
关于python - 对 pandas 数据框进行重新采样、分组、旋转,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28868666/