我在这个特定集合中有 300,000 个文档。 每份文件均被视为一次出租车行程。 每个文档都包含一个出租车站号码和一个许可证号码。
我的目标是计算出每个出租车站每个出租车许可证的行程次数。
例如:
出租车站 A 牌照 X 已行驶 5 次。
出租车站 A 执照 Y 有 9 趟行程。等等。
如何优化我的查询?最多需要 30 分钟才能完成!
List /*of*/ taxistationOfCollection, taxiLicenseOfTaxistation;
//Here I get all the distinct TaxiStation numbers in the collection
taxistationOfCollection = coll.distinct("TaxiStation");
BasicDBObject query, tripquery;
int tripcount;
//Now I have to loop through each Taxi Station
for(int i = 0; i<taxistationOfCollection.size(); i++)
{
query = new BasicDBObject("TaxiStation", taxistationOfCollection.get(i));
//Here, I make a list of each distinct Taxi License in the current Taxi station
taxiLicenseOfTaxistation = coll.distinct("TaxiLicense", query);
//Now I make a loop to process each Taxi License within the current Taxi station
for(int k = 0; k<taxiLicenseOfTaxistation.size();k++)
{
tripcount=0;
if(taxiLicenseOfTaxistation.get(k) !=null)
{
//I'm looking for each Taxi Station with this Taxi License
tripquery= new BasicDBObject("TaxiStation", taxistationOfCollection.get(i)).append("TaxiLicense", taxiLicenseOfTaxistation.get(k));
DBCursor cursor = coll.find(tripquery);
try {
while(cursor.hasNext()) {
//Increasing my counter everytime I find a match
tripcount++;
cursor.next();
}
} finally {
//Finally printing the results
System.out.println("Station: " + taxistationOfCollection.get(i) + " License:" + taxiLicenseOfTaxistation.get(k)
+ " Trips: " + tripcount);
}
}
}
}
示例文档:
{
"_id" : ObjectId("53df46ed9b2ed78fb7ca4f23"),
"Version" : "2",
"Display" : [],
"Generated" : "2014-08-04,16:40:05",
"GetOff" : "2014-08-04,16:40:05",
"GetOffCellInfo" : "46001,43027,11237298",
"Undisplay" : [],
"TaxiStation" : "0000",
"GetOn" : "2014-08-04,16:40:03",
"GetOnCellInfo" : "46001,43027,11237298",
"TaxiLicense" : "000000",
"TUID" : "26921876-3bd5-432e-a014-df0fb26c0e6c",
"IMSI" : "460018571356892",
"MCU" : "CM8001MA121225V1",
"System_ID" : "000",
"MeterGetOffTime" : "",
"MeterGetOnTime" : "",
"Setup" : [],
"MeterSID" : "",
"MeterWaitTime" : "",
"OS" : "4.2",
"PackageVersion" : "201407300888",
"PublishVersion" : "201312060943",
"SWVersion" : "rel_touchbox_20101010",
"MeterMile" : 0,
"MeterCharged" : 0,
"GetOnLongitude" : 0,
"GetOnLatitude" : 0,
"GetOffLongitude" : 0,
"TripLength" : 2,
"GetOffLatitude" : 0,
"Clicks" : 0,
"updateTime" : "2014-08-04 16:40:10"
}
最佳答案
Aggregation可能就是您正在寻找的。通过聚合操作,您的整个代码在数据库上运行,并且可以在几行内执行。性能也应该会好很多,因为数据库处理需要完成的所有事情,并且可以充分利用索引和其他东西。
根据您发布的内容,这可以归结为 simple $group
operation 。在 shell 中,这看起来像:
db.taxistationOfCollection.aggregate([
{$group:
{ _id:
{station: "$TaxiStation",
licence: "$TaxiLicense"},
count : {$sum : 1}
}
])
这将为您提供以下形式的文档
{_id : {station: stationid, licence: licence_number}, count: number_of_documents}
对于 Java,它看起来像这样:
DBObject taxigroup = new BasicDBObject("$group",
new BasicDBObject("_id",
new BasicDBObject("station","$TaxiStation")
.append("Licence","$TaxiLicense"))
.append("count", new BasicDBObject("$sum",1)));
AggregationOutput aggout = taxistationOfCollection.aggregate(
Arrays.asList(taxigroup));
请注意,代码片段未经测试。
关于java - 如何优化 Mongodb 的查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26881468/