Elasticsearch 数据聚合与数组分面

标签 elasticsearch faceted-search

我在索引中插入一些公司,其中 countries 属性是一个国家代码数组:

curl -XPUT 'http://localhost:9200/test/company/10' -d '{"countries" : ["CH", "CN"], "name" : "company10"}'
curl -XPUT 'http://localhost:9200/test/company/11' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR"], "name" : "company11"}'
curl -XPUT 'http://localhost:9200/test/company/12' -d '{"countries" : ["AT", "CN", "EN", "FR"], "name" : "company12"}'
curl -XPUT 'http://localhost:9200/test/company/13' -d '{"countries" : ["CH", "CN", "HU"], "name" : "company13"}'
curl -XPUT 'http://localhost:9200/test/company/14' -d '{"countries" : ["CH", "CN", "EN", "FR"], "name" : "company14"}'
curl -XPUT 'http://localhost:9200/test/company/15' -d '{"countries" : ["AT", "CN", "DE", "EN", "FR", "HU"], "name" : "company15"}'
curl -XPUT 'http://localhost:9200/test/company/16' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company16"}'
curl -XPUT 'http://localhost:9200/test/company/17' -d '{"countries" : ["BE", "CN", "EN"], "name" : "company17"}'
curl -XPUT 'http://localhost:9200/test/company/18' -d '{"countries" : ["AT", "CH", "CN", "DE"], "name" : "company18"}'
curl -XPUT 'http://localhost:9200/test/company/19' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company19"}'
curl -XPUT 'http://localhost:9200/test/company/20' -d '{"countries" : ["EN", "FR"], "name" : "company20"}'
curl -XPUT 'http://localhost:9200/test/company/21' -d '{"countries" : ["AT", "BE", "DE", "FR", "HU"], "name" : "company21"}'
curl -XPUT 'http://localhost:9200/test/company/22' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company22"}'
curl -XPUT 'http://localhost:9200/test/company/23' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "HU"], "name" : "company23"}'
curl -XPUT 'http://localhost:9200/test/company/24' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR"], "name" : "company24"}'
curl -XPUT 'http://localhost:9200/test/company/25' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR"], "name" : "company25"}'
curl -XPUT 'http://localhost:9200/test/company/26' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company26"}'
curl -XPUT 'http://localhost:9200/test/company/27' -d '{"countries" : ["AT", "EN", "FR"], "name" : "company27"}'
curl -XPUT 'http://localhost:9200/test/company/28' -d '{"countries" : ["CN"], "name" : "company28"}'
curl -XPUT 'http://localhost:9200/test/company/29' -d '{"countries" : ["BE", "CH", "CN", "EN", "FR"], "name" : "company29"}'
curl -XPUT 'http://localhost:9200/test/company/30' -d '{"countries" : ["CN"], "name" : "company30"}'

我想按国家代码(国家属性)汇总公司,计算每个国家有多少公司。

遗憾的是,即使这样(AT 代码的计数)也不起作用:

curl -XGET 'http://localhost:9200/test/company/_search?pretty=true' -d '
{"query"  : { "match_all" : {} },
 "facets" : {
    "foo" : {
      "filter" : {
        "term" : { "countries" : "AT" }
      }
    }
  }
}
'

我得到:

...

"facets" : {
  "foo" : {
    "_type" : "filter",
    "count" : 0
  }
}

我做错了什么?

最佳答案

我认为是因为没有分析过滤器。 AT 是停用词,因此未编入索引。您可以使用 _analyze API 检查它:http://localhost:9200/test/_analyze?text=AT&field=countries

您可以检查非停用词,例如 CN,但这是小写的 http://localhost:9200/test/_analyze?text=CN&field=countries。因此 cn(实际上存储在索引中)与您的分面过滤器中的 CN 不匹配。

您可以尝试将搜索修改为小写的国家/地区缩写:

curl -XGET 'http://localhost:9200/test/company/_search?pretty=true' -d '
{"query"  : { "match_all" : {} },
 "facets" : {
    "foo" : {
      "filter" : {
        "term" : { "countries" : "cn" }
      }
    }
  }
}'

得到

"facets" : {
    "foo" : {
      "_type" : "filter",
      "count" : 15
    }
  }

但我认为您应该将国家/地区的映射定义为 "index":"not_analyzed" 以避免这种情况(停用词和小写)

# Delete index
#
curl -XDELETE 'http://localhost:9200/test'

# Create with mapping
#
curl -XPUT 'http://localhost:9200/test/' -d '{
  "mappings": {
    "company": {
      "properties": {
        "countries": { "type": "string", "index" : "not_analyzed"  }
      }
    }
  }
}'


# Index documents
#
curl -XPUT 'http://localhost:9200/test/company/10' -d '{"countries" : ["CH", "CN"], "name" : "company10"}'
curl -XPUT 'http://localhost:9200/test/company/11' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR"], "name" : "company11"}'
curl -XPUT 'http://localhost:9200/test/company/12' -d '{"countries" : ["AT", "CN", "EN", "FR"], "name" : "company12"}'
curl -XPUT 'http://localhost:9200/test/company/13' -d '{"countries" : ["CH", "CN", "HU"], "name" : "company13"}'
curl -XPUT 'http://localhost:9200/test/company/14' -d '{"countries" : ["CH", "CN", "EN", "FR"], "name" : "company14"}'
curl -XPUT 'http://localhost:9200/test/company/15' -d '{"countries" : ["AT", "CN", "DE", "EN", "FR", "HU"], "name" : "company15"}'
curl -XPUT 'http://localhost:9200/test/company/16' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company16"}'
curl -XPUT 'http://localhost:9200/test/company/17' -d '{"countries" : ["BE", "CN", "EN"], "name" : "company17"}'
curl -XPUT 'http://localhost:9200/test/company/18' -d '{"countries" : ["AT", "CH", "CN", "DE"], "name" : "company18"}'
curl -XPUT 'http://localhost:9200/test/company/19' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company19"}'
curl -XPUT 'http://localhost:9200/test/company/20' -d '{"countries" : ["EN", "FR"], "name" : "company20"}'
curl -XPUT 'http://localhost:9200/test/company/21' -d '{"countries" : ["AT", "BE", "DE", "FR", "HU"], "name" : "company21"}'
curl -XPUT 'http://localhost:9200/test/company/22' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company22"}'
curl -XPUT 'http://localhost:9200/test/company/23' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "HU"], "name" : "company23"}'
curl -XPUT 'http://localhost:9200/test/company/24' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR"], "name" : "company24"}'
curl -XPUT 'http://localhost:9200/test/company/25' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR"], "name" : "company25"}'
curl -XPUT 'http://localhost:9200/test/company/26' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company26"}'
curl -XPUT 'http://localhost:9200/test/company/27' -d '{"countries" : ["AT", "EN", "FR"], "name" : "company27"}'
curl -XPUT 'http://localhost:9200/test/company/28' -d '{"countries" : ["CN"], "name" : "company28"}'
curl -XPUT 'http://localhost:9200/test/company/29' -d '{"countries" : ["BE", "CH", "CN", "EN", "FR"], "name" : "company29"}'
curl -XPUT 'http://localhost:9200/test/company/30' -d '{"countries" : ["CN"], "name" : "company30"}'

# Refresh index
#
curl -XPOST 'http://localhost:9200/test/_refresh'

# Search
#
curl -XGET 'http://localhost:9200/test/company/_search?pretty=true' -d '
{"query"  : { "match_all" : {} },
 "facets" : {
    "foo" : {
      "filter" : {
        "term" : { "countries" : "AT" }
      }
    }
  }
}
'

关于Elasticsearch 数据聚合与数组分面,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19114181/

相关文章:

php - Elasticsearch 按位置查找文档

elasticsearch - 如何使用Elasticsearch使用构面计数器构建构面搜索

java - 分面搜索中的自动深入分析

amazon-web-services - AppSearch配置中没有索引名称

elasticsearch - Elasticsearch多个分析器不起作用

elasticsearch - 通过嵌套属性对ElasticSearch中的搜索结果进行计数

amazon-web-services - 在AWS Elasticsearch Service上托管的Kibana仪表板上使用AWS账单明细创建仪表板

mysql - Solr - 模式帮助(产品属性)

solr - SOLR 的分面查询浏览器

elasticsearch - 从elasticsearch中的对象获取所有键的方面