elasticsearch - Elasticsearch查询时间提升产生的结果顺序不足

标签 elasticsearch lucene

在为每个关键字应用one two three功能之后,给定搜索关键字boost的ES搜索结果似乎错误。请帮助我修改我的“错误”查询,以完成下文所述的“预期结果”。我使用ESC 1.7.4 和LUCENE 4.10.4

提升标准-three被认为是最重要的关键字:

word - boost
----   -----
one    1
two    2
three  3

ES索引内容-仅显示MySQL转储以使发布更短
mysql> SELECT id, title FROM post;
+----+-------------------+
| id | title             |
+----+-------------------+
|  1 | one               |
|  2 | two               |
|  3 | three             |
|  4 | one two           |
|  5 | one three         |
|  6 | one two three     |
|  7 | two three         |
|  8 | none              |
|  9 | one abc           |
| 10 | two abc           |
| 11 | three abc         |
| 12 | one two abc       |
| 13 | one two three abc |
| 14 | two three abc     |
+----+-------------------+
14 rows in set (0.00 sec)

预期的ES查询结果-用户正在搜索one two three。我对记录得分相同的顺序并不感到困惑。我的意思是,如果记录6和13切换位置,我不在乎。
+----+-------------------+
| id | title             | my scores for demonstration purposes
+----+-------------------+
|  6 | one two three     | (1+2+3 = 6)
| 13 | one two three abc | (1+2+3 = 6)
|  7 | two three         | (2+3 = 5)
| 14 | two three abc     | (2+3 = 5)
|  5 | one three         | (1+3 = 4)
|  4 | one two           | (1+2 = 3)
| 12 | one two abc       | (1+2 = 3)
|  3 | three             | (3 = 3)
| 11 | three abc         | (3 = 3)
|  2 | two               | (2 = 2)
| 10 | two abc           | (2 = 2)
|  1 | one               | (1 = 1)
|  9 | one abc           | (1 = 1)
|  8 | none              | <- This shouldn't appear
+----+-------------------+
14 rows in set (0.00 sec)

意外的ES查询结果-不幸的是,这就是我得到的。
+----+-------------------+
| id | title             | _score
+----+-------------------+
|  6 | one two three     | 1.0013864
| 13 | one two three abc | 1.0013864
|  4 | one two           | 0.57794875
|  3 | three             | 0.5310148
|  7 | two three         | 0.50929534
|  5 | one three         | 0.503356
| 14 | two three abc     | 0.4074363
| 11 | three abc         | 0.36586377
| 12 | one two abc       | 0.30806428
| 10 | two abc           | 0.23231897
|  2 | two               | 0.12812772
|  1 | one               | 0.084527075
|  9 | one abc           | 0.07408653
+----+-------------------+

ES查询
curl -XPOST "http://127.0.0.1:9200/_search?post_dev" -d'
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "title": {
            "query": "one two three"
          }
        }
      },
      "should": [
        {
          "match": {
            "title": {
              "query": "one",
              "boost": 1
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "two",
              "boost": 2
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "three",
              "boost": 3
            }
          }
        }
      ]
    }
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ],
  "from": "0",
  "size": "100"
}'

其他一些测试查询:
  • query不会产生任何结果。
  • query的顺序不正确,好像here一样。
  • 最佳答案

    # Index some test data
    curl -XPUT "localhost:9200/test/doc/1" -d '{"title": "one"}'
    curl -XPUT "localhost:9200/test/doc/2" -d '{"title": "two"}'
    curl -XPUT "localhost:9200/test/doc/3" -d '{"title": "three"}'
    curl -XPUT "localhost:9200/test/doc/4" -d '{"title": "one two"}'
    curl -XPUT "localhost:9200/test/doc/5" -d '{"title": "one three"}'
    curl -XPUT "localhost:9200/test/doc/6" -d '{"title": "one two three"}'
    curl -XPUT "localhost:9200/test/doc/7" -d '{"title": "two three"}'
    curl -XPUT "localhost:9200/test/doc/8" -d '{"title": "none"}'
    curl -XPUT "localhost:9200/test/doc/9" -d '{"title": "one abc"}'
    curl -XPUT "localhost:9200/test/doc/10" -d '{"title": "two abc"}'
    curl -XPUT "localhost:9200/test/doc/11" -d '{"title": "three abc"}'
    curl -XPUT "localhost:9200/test/doc/12" -d '{"title": "one two abc"}'
    curl -XPUT "localhost:9200/test/doc/13" -d '{"title": "one two three abc"}'
    curl -XPUT "localhost:9200/test/doc/14" -d '{"title": "two three abc"}'
    # Make test data available for search
    curl -XPOST "localhost:9200/test/_refresh?pretty"
    # Search using function score
    curl -XPOST "localhost:9200/test/doc/_search?pretty" -d'{
        "query": {
            "function_score": {
                "query": {
                    "match": {
                        "title": "one two three"
                    }
                },
                "functions": [
                    {
                        "filter": {
                            "query": {
                                "match": {
                                    "title": "one"
                                }
                            }
                        },
                        "weight": 1
                    },
                    {
                        "filter": {
                            "query": {
                                "match": {
                                    "title": "two"
                                }
                            }
                        },
                        "weight": 2
                    },
                    {
                        "filter": {
                            "query": {
                                "match": {
                                    "title": "three"
                                }
                            }
                        },
                        "weight": 3
                    }
                ],
                "score_mode": "sum",
                "boost_mode": "replace"
            }
        },
        "sort": [
            {
                "_score": {
                    "order": "desc"
                }
            }
        ],
        "from": "0",
        "size": "100"
    }'
    

    关于elasticsearch - Elasticsearch查询时间提升产生的结果顺序不足,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37372951/

    相关文章:

    Lucene 搜索和下划线

    elasticsearch - 带有日期类型的elasticsearch php客户端搜索

    搜索查询以在禁用 _source 的情况下在 elasticsearch 中检索嵌套文档

    java - 如何在 SOLR 中将不同的模式文件分配给不同的核心?

    java - 使用 Lucene 从非常大的文件中获取随机行

    java - HashMap 潜在的资源泄漏(未分配的 Closeable)

    python - 如何在 Python 中使用 elasticsearch 检索 1M 的文档?

    elasticsearch - 将id_key与fluentd/elasticsearch结合使用

    html - Elasticsearch原始HTML文档搜索

    elasticsearch - Elasticsearch,将嵌套过滤器与普通过滤器结合