我有一个如下所示的集合:
我想要做的是只返回所有数据,但合并一分钟内由同一用户完成的数据。例如,amy123 首先检索,然后打开,这 2 条记录在 1 分钟内添加,因此我只想返回 amy123 在该分钟内的最新记录。另外,尽管 bob1 在这一分钟内有记录,但它是不同的用户,所以我也会显示它。
最终输出应如下所示:
如何编码以在 mongodb 管道中或使用 python json 执行上述操作?
我的想法是首先检查之前的记录是否是相同的用户名。如果是相同的用户名,那么我会检查时间差异。如果差异<=1分钟,我将返回最新的记录。如果没有,我将归还所有记录。但是,我不知道如何编码。
提前致谢!
最佳答案
通过使用 $reduce
作为“带状态的 for 循环”可以很好地解决这个问题。考虑这个输入集:
{name: "amy", ts: new ISODate("2020-01-01T00:00:20Z"), o1:1, o2:"X1"}
,{name: "amy", ts: new ISODate("2020-01-01T00:00:30"), o1:2, o2:"X2"}
,{name: "amy", ts: new ISODate("2020-01-01T00:00:58"), o1:3, o2:"X3"}
,{name: "amy", ts: new ISODate("2020-01-01T00:01:15"), o1:31, o2:"X31"}
,{name: "amy", ts: new ISODate("2020-01-01T00:01:30"), o1:32, o2:"X32"}
,{name: "amy", ts: new ISODate("2020-01-01T00:02:00"), o1:4, o2:"X4"}
,{name: "amy", ts: new ISODate("2020-01-01T00:02:40"), o1:5, o2:"X5"}
,{name: "amy", ts: new ISODate("2020-01-01T00:04:00"), o1:65, o2:"X65"}
,{name: "amy", ts: new ISODate("2020-01-01T00:04:10"), o1:75, o2:"X75"}
,{name: "amy", ts: new ISODate("2020-01-01T00:20:35"), o1:86, o2:"X86"}
,{name: "amy", ts: new ISODate("2020-01-01T00:20:36"), o1:96, o2:"X96"}
,{name: "bob", ts: new ISODate("2020-01-01T00:00:30"), o1:7, o2:"X7"}
,{name: "bob", ts: new ISODate("2020-01-01T00:01:30"), o1:8, o2:"X8"}
,{name: "bob", ts: new ISODate("2020-01-01T00:01:35"), o1:9, o2:"X9"}
o1
和 o2
是其他数据的占位符,并不直接属于一分钟存储方案的一部分。它们将在解决方案中自动携带;可以携带任何数量的任何类型的其他字段,而无需更改查询。
db.foo.aggregate([
// If appropriate, start with a $match stage here to cut down the amount
// of material, especially dates. You probably do not want 1 min buckets
// of everything from day 1. For now, no $match.
// Ensure everything going in date order....
{$sort: {ts:1}},
// Group by name and push the whole doc (which is in date order) onto
// array 'a':
{$group: {_id:"$name", a: {$push: "$$CURRENT"}}},
// Next, iterate over 'a' using $reduce and rebuild it with our 1 min
// buckets, then overwrite the old 'a' using $addFields:
{$addFields: {a: {$let: {
// Get first element of a in prep for setting init value...
vars: {sd: {$arrayElemAt:["$a",0]}},
in: {$reduce: {
input: "$a",
initialValue: {
prev:"$$sd", // the whole doc
last:"$$sd.ts", // last anchor date, e.g. start of 60 interval
accum: []
},
in: {$cond: [
// If the next ts < 60000 millis beyond anchor..
{$lt:[{$subtract:["$$this.ts", "$$value.last"]}, 60000]},
// then capture it as prev but let last anchor
// and accumulated hits carry forward unchanged
{prev: "$$this",
last: "$$value.last", // carry
accum: "$$value.accum" // carry
},
// else capture it AND reset the anchor AND append the
// previous value to the accum array (because at the this
// point we "overran" the 60 interval). Note that
// $concatArrays wants arrays, not docs as inputs so we
// must wrap $$value.prev (which is a $$CURRENT doc) with []:
{prev: "$$this",
last: "$$this.ts", // reset last to this one
accum: {$concatArrays: [ "$$value.accum", ["$$value.prev"] ] }
}
]
}
}}
}}
}},
// Our use of $reduce will always leave the very last value (which
// we always take) "dangling" in prev, so here we simply do one more concat.
// We also take the oppty to both "lift" 'a.accum' to just 'a' and in so
// doing get rid of 'prev' and 'last':
{$addFields: {a: {$concatArrays: [ "$a.accum", [ "$a.prev" ]]} }}
]);
这会产生:
{
"_id" : "amy",
"a" : [
{
"_id" : ObjectId("61f7e34ba565bb368b38e2cb"),
"name" : "amy",
"ts" : ISODate("2020-01-01T00:01:15Z"),
"o1" : 31,
"o2" : "X31"
},
{
"_id" : ObjectId("61f7e34ba565bb368b38e2cd"),
"name" : "amy",
"ts" : ISODate("2020-01-01T00:02:00Z"),
"o1" : 4,
"o2" : "X4"
},
{
"_id" : ObjectId("61f7e34ba565bb368b38e2ce"),
"name" : "amy",
"ts" : ISODate("2020-01-01T00:02:40Z"),
"o1" : 5,
"o2" : "X5"
},
{
"_id" : ObjectId("61f7e34ba565bb368b38e2d0"),
"name" : "amy",
"ts" : ISODate("2020-01-01T00:04:10Z"),
"o1" : 75,
"o2" : "X75"
},
{
"_id" : ObjectId("61f7e34ba565bb368b38e2d2"),
"name" : "amy",
"ts" : ISODate("2020-01-01T00:20:36Z"),
"o1" : 96,
"o2" : "X96"
}
]
}
{
"_id" : "bob",
"a" : [
{
"_id" : ObjectId("61f7e34ba565bb368b38e2d3"),
"name" : "bob",
"ts" : ISODate("2020-01-01T00:00:30Z"),
"o1" : 7,
"o2" : "X7"
},
{
"_id" : ObjectId("61f7e34ba565bb368b38e2d5"),
"name" : "bob",
"ts" : ISODate("2020-01-01T00:01:35Z"),
"o1" : 9,
"o2" : "X9"
}
]
}
如果您不想要或不需要通用解决方案,那么您可以推送子集文档,而不是推送 $$CURRENT
,例如
{$group: {_id:"$name", a: {$push: {ts: "$ts", o1: "$o1"}} }},
管道的其余部分保持不变 - 但您必须始终在 a
数组中包含字段 ts
才能正确驱动逻辑。
试试 mongoplayground.net .
关于python - 如何逐行比较记录并删除不符合条件的行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70916532/