python - 我如何检查数据框中行之间的相似性并添加一列作为计数器和增量。当行匹配时呢？

对 python(Pandas)有点陌生，请帮我解决这个问题

这就是我的数据框的样子:- Device_id 是在时间(1524724677)显示(消息)的设备的 ID，时间以纪元为单位。

  Device_Id    Msg                Time
0  ABC123     connected        1524724677
1  ABC123     connected        1524724679
2  XYZ123     device failed    1524724814
3  ABC123     connected        1524725279
4  XVZ123     device failed    1524725300
5  PQR123      error           1524725325
6  ABC123     connected        1524725345

我必须对数据帧的每一行执行操作，以便我可以添加一些新列。

我想要的数据框看起来像:-

  Device_Id    Msg                Time       count
0  ABC123     connected        1524724677      1
1  ABC123     connected        1524724679      2
2  XYZ123     device failed    1524724814      1
3  ABC123     connected        1524725279      1
4  XVZ123     device failed    1524725300      1
5  PQR123      error           1524725325      1
6  ABC123     connected        1524725345      2

此计数列的工作方式与例如:

请阅读所有要点，以明确计数列的工作原理

--for row(0) count is (1), bcoz this is the unique device
--we will increase the counter w.r.t (Time)
--we will reset the counter values after every 10 minutes
--for row(1) count is (2), bcoz time (1524724679) is between 
  1524724677 and 1524724677 + 10 minutes.
--for row(2), it is unique device and time(1524724679) 
  between 1524724677 and 1524724677 + 10 minutes  so count is (1).
--for row(3), notice it is not unique device then also it has count=1 
  bcoz, time(1524725279) is not between 1524724677 and 1524724677 + 10 
  minutes. (Count reset)
--for col(4) count is (1), bcoz time (1524725300) is between 
  1524725279 and 1524725279 + 10 minutes.
--for col(5), count=1, unique device and time (1524725325) between 1524725279 
  and 1524725279 + 10 minutes.
--for col(6) count=2, bcoz time(1524725345) is between 1524725279 
  and 1524725279 + 10 minutes.

计数值每 10 分钟重置一次，这意味着每个 device_id 将从 (1) 开始。

每 10 分钟后，每个唯一的 device_id 将被视为新的，这就是为什么计数重新从 1 开始并在接下来的 10 分钟内保持其值。

最佳答案

您可以使用 groupby 和 grouper函数可以轻松解决这个问题:

# convert time
df['Time'] = pd.to_datetime(df['Time'], unit='s')

# get output
df['count'] = df.groupby(['Device_Id', pd.Grouper(key='Time', freq='10min')]).cumcount()+1

print(df)

  Device_Id            Msg                Time  count
0    ABC123      connected 2018-04-26 06:37:57      1
1    ABC123      connected 2018-04-26 06:37:59      2
2    XYZ123  device failed 2018-04-26 06:40:14      1
3    ABC123      connected 2018-04-26 06:47:59      1
4    XVZ123  device failed 2018-04-26 06:48:20      1
5    PQR123          error 2018-04-26 06:48:45      1
6    ABC123      connected 2018-04-26 06:49:05      2

关于python - 我如何检查数据框中行之间的相似性并添加一列作为计数器和增量。当行匹配时呢？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50212175/

python - 我如何检查数据框中行之间的相似性并添加一列作为计数器和增量。当行匹配时呢？

上一篇：machine-learning - 如何配置 word2vec 不使用负采样？

下一篇：machine-learning - 我们如何定义神经网络中的不良学习率？