elasticsearch - 从Elasticsearch的基本术语聚合中删除停用词？

我对Elasticsearch有点陌生，但是基本上我有一个名为posts的索引，其中包含多个post文档，格式如下:

"post": {
    "id": 123,
    "message": "Some message"
}

我正在尝试通过简单的术语汇总来获取整个索引中message字段中出现频率最高的单词:

curl -XPOST 'localhost:9200/posts/_search?pretty' -d '
{
    "aggs": {
        "frequent_words": {
            "terms": {
                "field": "message"
            }
        }
    }
}
'

不幸的是，这种聚合包括停用词，因此我最终得到了诸如“and”，“the”，“then”等词的列表，而不是更有意义的词。

我尝试应用分析器排除那些停用词，但无济于事:

curl -XPUT 'localhost:9200/posts/?pretty' -d '
{
    "settings": {
        "analysis": {
            "analyzer": {
                "standard": {
                    "type": "standard",
                    "stopwords": "_english_"
                }
            }
        }
    }
}'

我是在正确使用分析仪，还是以错误的方式进行操作？谢谢!

最佳答案

我想您忘记将分析器设置为您在类型字段中提交的消息了。因为Elasticsearch在汇总数据时使用其索引数据。这意味着如果您正确分析字段，Elasticsearch不会得到您的停用词。您可以检查此link。我使用了kibana的Sense插件执行以下请求。检查映射创建请求

PUT /posts
{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "type": "standard",
                    "stopwords": ["test", "testable"]
                }
            }
        }
    }
}

### Dont forget these lines
POST /posts/post/_mapping
{
  "properties": {
    "message": {
      "type": "string", 
      "analyzer": "my_analyzer"
    }
  }
}

POST posts/post/1
{
  "id": 1,
  "message": "Some messages"
}

POST posts/post/2
{
  "id": 2,
  "message": "Some testable message"
}

POST posts/post/3
{
  "id": 3,
  "message": "Some test message"
}


POST /posts/_search
{
    "aggs": {
        "frequent_words": {
            "terms": {
                "field": "message"
            }
        }
    }
}

这是此搜索请求的结果集:

{
  "hits": {
  ...
  },
  "aggregations": {
    "frequent_words": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "some",
          "doc_count": 3
        },
        {
          "key": "message",
          "doc_count": 2
        },
        {
          "key": "messages",
          "doc_count": 1
        }
      ]
    }
  }
}

关于elasticsearch - 从Elasticsearch的基本术语聚合中删除停用词？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39262837/

elasticsearch - 从Elasticsearch的基本术语聚合中删除停用词？

上一篇：ruby-on-rails-4 - 在任何列中搜索单词的任何部分

下一篇：elasticsearch - 搜索时从Elasticsearch中排除空格