java - 为什么统计 GAE 数据存储中 X 个实体的时间会随着实体总数的增加而增加?

标签 java google-app-engine google-cloud-datastore objectify

我运行了一些测试来计算 Google App Engine 数据存储区中 X 类实体的数量,计数限制为 5000。令我惊讶的是,此操作所花费的时间增加数据存储区中 X 类实体的总数增加。

如果计数操作只是在实体的键上遍历索引,那么无论数据存储区中 X 类实体的总数如何,时间不应该是恒定的(只要总计数 > 5000)吗?

[注意:这不是关于是否使用分片计数器还是使用数据存储统计信息,而是关于我的测试结果是否违反直觉。]

更新 1:devserver 上进行测试。

这是一些数据:

Time to create & save 100000 entities: 35.92 s
Using Objectify:
Individual times of 10 runs: 14795, 9521, 9300, 9117, 9848, 9391, 8378, 8525, 8593, 8706
Average time to count 5000 entities over 10 runs: 9.617 seconds
--------------------------------------------------------------------------------
Using Datastore:
Individual times of 10 runs: 8984, 8827, 9062, 9160, 8768, 8737, 8488, 8523, 8828, 8956
Average time to count 5000 entities over 10 runs: 8.833 seconds
--------------------------------------------------------------------------------

Time to create & save 50000 entities: 20.03 s
Using Objectify:
Individual times of 10 runs: 5877, 4736, 4162, 4252, 4126, 4203, 4153, 4168, 4051, 4110
Average time to count 5000 entities over 10 runs: 4.384 seconds
--------------------------------------------------------------------------------
Using Datastore:
Dec 16, 2015 10:00:36 AM in.co.amebatechnologies.empireapp.test.DatastoreTests tearDown
INFO: Closing this session
Individual times of 10 runs: 4409, 4380, 4577, 4414, 4121, 4050, 4076, 4050, 4089, 4148
Average time to count 5000 entities over 10 runs: 4.231 seconds
--------------------------------------------------------------------------------

Time to create & save 10000 entities: 8.989 s
Using Objectify:
Individual times of 10 runs: 1893, 802, 713, 678, 679, 657, 648, 654, 659, 654
Average time to count 5000 entities over 10 runs: 0.804 seconds
--------------------------------------------------------------------------------
Using Datastore:
Individual times of 10 runs: 923, 789, 871, 680, 677, 694, 680, 682, 728, 682
Average time to count 5000 entities over 10 runs: 0.741 seconds
--------------------------------------------------------------------------------

使用:

  • GAE SDK 1.9.30
  • 对象化5.1.7

直接对数据存储区中的实体进行计数的代码(即不使用Objectify):

DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
com.google.appengine.api.datastore.Query qry = new com.google.appengine.api.datastore.Query();
qry.setKeysOnly();
PreparedQuery prepQry = ds.prepare(qry);
FetchOptions fetchOpts = FetchOptions.Builder.withOffset(0).limit(5000).chunkSize(1000);
// Time this operation only:
prepQry.countEntities(fetchOpts);

最佳答案

开发服务器提供生产环境的本地模拟,包括数据存储(记录为 here )。但是,数据存储模拟对于大型数据集不具有相同的性能特征。

当我运行您的代码来对生产数据存储环境中的实体进行计数(从 10,000 到 100,000 个实体)时,10 次运行的平均时间是一致的:

Total Entities: 20000
Run 0: Time to count 5000 entities: 242
Run 1: Time to count 5000 entities: 352
Run 2: Time to count 5000 entities: 215
Run 3: Time to count 5000 entities: 244
Run 4: Time to count 5000 entities: 241
Run 5: Time to count 5000 entities: 221
Run 6: Time to count 5000 entities: 258
Run 7: Time to count 5000 entities: 219
Run 8: Time to count 5000 entities: 260
Run 9: Time to count 5000 entities: 219
Average: 247.1

Total Entities: 50000
Run 0: Time to count 5000 entities: 346
Run 1: Time to count 5000 entities: 236
Run 2: Time to count 5000 entities: 214
Run 3: Time to count 5000 entities: 353
Run 4: Time to count 5000 entities: 244
Run 5: Time to count 5000 entities: 229
Run 6: Time to count 5000 entities: 244
Run 7: Time to count 5000 entities: 257
Run 8: Time to count 5000 entities: 216
Run 9: Time to count 5000 entities: 224
Average: 256.3

Total Entities: 100000
Run 0: Time to count 5000 entities: 215
Run 1: Time to count 5000 entities: 212
Run 2: Time to count 5000 entities: 329
Run 3: Time to count 5000 entities: 217
Run 4: Time to count 5000 entities: 230
Run 5: Time to count 5000 entities: 231
Run 6: Time to count 5000 entities: 225
Run 7: Time to count 5000 entities: 222
Run 8: Time to count 5000 entities: 273
Run 9: Time to count 5000 entities: 306
Average: 246.0

关于java - 为什么统计 GAE 数据存储中 X 个实体的时间会随着实体总数的增加而增加?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34309410/

相关文章:

java - Java 中的深拷贝构造函数

java - 在数据存储中存储和检索多值属性

python - 在应用程序引擎中按实体的子级过滤查询(python)

go - 如何使用Ancestor查询和最新的golang库从数据存储中读取

java - GAE (Java) 高复制数据存储测试 - 测试用例之间未清除数据存储

java - Spring如何通过XML向Queue注入(inject)值

java - 验证用户输入 JDBC 的最佳方式?

java - 我想用 java 对 solaris 和 linux 进行基准测试

google-app-engine - 如何在 Google App Engine 标准环境中使用 Gorilla session 避免内存泄漏?

java - Grails 和 Google App Engine - 我应该一起使用它们吗?