elasticsearch - 如何使用Curl将scroll_id发送到ElasticSearch

标签 elasticsearch curl

我不知道如何使用Curl将scroll_id发送到ElasticSearch。
到目前为止,这是我尝试过的方法,但是似乎没有用。

$url = "http://distribution.virk.dk/cvr-permanent/virksomhed/_search?scroll=2m&_scroll_id=".$_POST["scroll_id"];
        $data = array(
            "_scroll_id" => $_POST["scroll_id"],
            "scroll_id" => $_POST["scroll_id"],
            "size" => 10, 
            "_source" => array(
                "Vrvirksomhed.cvrNummer",
                "Vrvirksomhed.elektroniskPost",
                "Vrvirksomhed.livsforloeb",
                "Vrvirksomhed.hjemmeside",
                "Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn",
                "Vrvirksomhed.hovedbranche",
                "Vrvirksomhed.penheder",
                "Vrvirksomhed.telefonnummer",
                "Vrvirksomhed.virksomhedMetadata.nyesteBeliggenhedsadresse"
            ), 
            "query" => array (
                "bool" => array (
                    "must_not" => array (
                        "exists" => array (
                            "field" => "Vrvirksomhed.livsforloeb.periode.gyldigTil"
                        )
                    )
                )
            )
        );  
ElasticSearch每次都返回相同的10个帖子,因此我认为它没有正确获得scroll_id。
在尝试Val的建议后更新了代码。使用setHosts会在很长一段时间后超时。忽略setHosts,我得到一个错误,说您的群集中没有 Activity 的节点。
use Elasticsearch\ClientBuilder;

    require 'vendor/autoload.php';


    $username = "MY_USERNAME";
    $password = "MY_PASSWORD";
    
    $hosts = [
        'host' => 'distribution.virk.dk',
        'scheme' => 'http',
        'path'  => '/cvr-permanent',
        'port' => '80',
        'user' => $username,
        'pass' => $password
    ];

    $client = ClientBuilder::create()->setHosts($hosts)->build();
    $params = [
    'scroll' => '30s',
    'size'   => 50,
    'type'   => '/cvr-permanent/virksomhed',
    'index'  => 'virksomhed',
    'body'   => [
        'query' => [
            'match_all' => new \stdClass()
        ]
    ]
    ];

    // Execute the search
    // The response will contain the first batch of documents
    // and a scroll_id
    $response = $client->search($params);

    // Now we loop until the scroll "cursors" are exhausted
    while (isset($response['hits']['hits']) && count($response['hits']['hits']) > 0) {

        // **
        // Do your work here, on the $response['hits']['hits'] array
        // **

        // When done, get the new scroll_id
        // You must always refresh your _scroll_id!  It can change sometimes
        $scroll_id = $response['_scroll_id'];

        // Execute a Scroll request and repeat
        $response = $client->scroll([
            'body' => [
                'scroll_id' => $scroll_id,  //...using our previously obtained _scroll_id
                'scroll'    => '30s'        // and the same timeout window
            ]
        ]);
    }

最佳答案

使用滚动API有两个步骤。
在第一步中,您需要发送查询和滚动上下文的持续时间。
在第二步中,您无需再次发送查询,而只需从上一次滚动搜索中获得的滚动ID。
您可以找到一个完整的示例here

关于elasticsearch - 如何使用Curl将scroll_id发送到ElasticSearch,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63701776/

相关文章:

elasticsearch - 在 Open Distro For Elasticsearch 上启用 xpack 功能

node.js - 如何在 Elasticsearch 中对字段类型 '' 文本进行排序

java - CURL:客户端和服务器通信 Windows c/java

php - 为什么要在 curl 中关闭 header ?

windows - Windows 7 命令提示符的代理设置

java - https ://graph. facebook.com/v2.10/me/photos?access_token=xxxxxx 返回错误

python - 将 PDF 文件转换为 Base64 以索引到 Elasticsearch

hadoop - elasticsearch只插入了10个文档

Elasticsearch 中的日期时间 - 如何处理时区

php - curl 错误 18 - 传输已关闭,剩余未完成的读取数据