python - 如何正确检查滚动结束?

标签 python elasticsearch

我正在使用 scroll method批量获取大量事件。我不知道如何正确停止滚动。

我现在正在做的(有效)是检查 TransportError,它表示滚动尝试失败:

scanResp= es.search(
    index="nessus_all",
    doc_type="marker",
    body={"query": {"match_all": {}}},
    search_type="scan",
    scroll="10m"
)
scrollId= scanResp['_scroll_id']
while True:
    try:
        response = es.scroll(scroll_id=scrollId, scroll= "10m")
        # process results
    except Exception as e:
        log.debug("ended scroll: {e}".format(e=e))
        break
# we are done with the search

这会在 /var/log/elasticsearch/security.log 中生成错误:

[2015-02-16 09:36:07,110][DEBUG][action.search.type       ] [eu4] [2791] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException: [eu5][inet[/10.81.147.186:9300]][indices:data/read/search[phase/scan/scroll]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [2791]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:502)
        at org.elasticsearch.search.SearchService.executeScan(SearchService.java:236)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:939)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:930)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

而且通常似乎不是正确的方法?

最佳答案

根据Elasticsearch's Scroll documentation (从 5.1 版开始):

Each call to the scroll API returns the next batch of results until there are no more results left to return, ie the hits array is empty.

因此,我认为最好的方法是检查 len(response['hits']['hits'])

一个更具体的例子:

response = es.search(
    index='index_name',
    body=<your query here>,
    scroll='10m'
)
scroll_id = response['_scroll_id']

while len(response['hits']['hits']):
    response = es.scroll(scroll_id=scroll_id, scroll='10m')
    # process results

关于python - 如何正确检查滚动结束?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28537547/

相关文章:

python - 为什么初始化列表列表的代码显然将列表链接在一起?

python - keras.backend.max 与 keras.backend.argmax 之间有什么区别?

python - 从C到Python : writing binary

elasticsearch - 如何使用默认索引和自定义分析器创建Nest ElasticSearch客户端?

python - Pandas:如何根据 ID 和条件替换面板数据集中的列值

python - python中的列表和文件操作

elasticsearch - 当它不是 “new”的一部分时,ElasticSearch会随时匹配单词 “new york”

python - 如果存在BooleanField,则NgramField无法正常工作……Haystack + Elasticsearch的问题

json - 如何使用 _update_by_query 将 json 对象添加到弹性索引中的多个文档?

elasticsearch - ElasticSearch/Kibana:只读用户权限