我们正在使用Elasticsearch 6.8版本。我只想使用滚动= 1m连接时间的滚动API(https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-scroll.html)。 (1m是一个示例,我要问的是x分钟或小时的最大值。)
我想知道的是这个滚动连接时间。如果我要求使用scrollId,则重置连接时间,但是最长连接时间是多少?或者保持很长的连接时间不好吗?
我想将scrollId与1到1千万条记录一起使用,并每1分钟成批导出我的文档。无论如何,如果我的系统因某种原因出现故障,我想继续我停下来的地方,因此,如果不使用额外的额外内存或cpu等,我想尽可能地使用我的连接。连接还活着,应该是什么?还是应该?
谢谢 !
最佳答案
保持滚动上下文有效的最大值是24h(24小时)。可以通过设置“search.max_keep_alive”群集设置来更改此限制。
设置较大的值会增加分片的负担。
从documentation
Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration
从documentation
Normally, the background merge process optimizes the index by merging together smaller segments to create new bigger segments, at which time the smaller segments are deleted. This process continues during scrolling, but an open search context prevents the old segments from being deleted while they are still in use. This is how Elasticsearch is able to return the results of the initial search request, regardless of subsequent changes to documents.
从documentation
Search context are automatically removed when the scroll timeout has been exceeded. However keeping scrolls open has a cost, as discussed in the previous section so scrolls should be explicitly cleared as soon as the scroll is not being used anymore using the clear-scroll API:
关于elasticsearch - ElasticSearch Scroll API连接时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61090931/