我测量了两种不同设置中时空查询的响应时间。
a) 我有一个实例,我安装了带有 postGIS 扩展的 PostgreSQL
b) 我有 5 个实例(1 个主实例、3 个从实例、1 个用于 pgpool II 的客户端)- pgpool II 复制。
我的查询是:
startTW = time.time()
fetchinTW = """SELECT col.vessel_hash,ST_X(col.the_geom) AS long, ST_Y(col.the_geom) AS lat
FROM samplecol AS col
WHERE col.timestamp >='2016-06-10T00:00:00.000Z' and col.timestamp <= '2016-07-10:00:00.000Z' """
cursor.execute(fetchinTW)
end_query3 = time.time()
print "Time to execute query: ", end_query3 - start_TW
在结果中,我看到 a) 设置中的响应时间小于 b) 中的响应时间。
一个设置 -> 响应时间:45,3456 秒
b 设置 -> 响应时间:28,4658 秒
在我运行实验之前,我认为在 pgpool II 复制(b 设置)的情况下响应时间会比在 a 中更长,原因是为了可用性和容错,数据在节点和 pgpool 之间复制II 会选择节点来发送查询,这会产生开销。对于单个节点,情况更简单,所以我认为响应时间会更好(设置)。
谁能解释这种行为?或者知道为什么会这样?
最佳答案
我相信这是预期的行为。根据documentation
, pgpool-II 的并行查询特性允许在不同的服务器上拆分查询:
Load Balance
If a database is replicated(because running in either replication mode or master/slave mode), performing a SELECT query on any server will return the same result. pgpool-II takes advantage of the replication feature in order to reduce the load on each PostgreSQL server. It does that by distributing SELECT queries among available servers, improving the system's overall throughput. In an ideal scenario, read performance could improve proportionally to the number of PostgreSQL servers. Load balancing works best in a scenario where there are a lot of users executing many read-only queries at the same time.
Parallel Query
Using the parallel query feature, data can be split among multiple servers, so that a query can be executed on all the servers concurrently, reducing the overall execution time. Parallel query works best when searching large-scale data.
这是 parallel mode
的配置参数列表功能。
关于python - postgresql/postgis 中的响应时间查询比较,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50924843/