azure - 如何在 Azure 搜索中匹配此查询

标签 azure keyword analyzer azure-cognitive-search

我有这个索引

{
  "name": "testentities",
  "fields": [
    {
      "name": "id",
      "type": "Edm.String",
      "key": true,
      "retrievable": true,
       "filterable": true,
       "sortable": true
    },
    {
      "name": "entity_id",
      "type": "Edm.String",
      "searchable": true,
      "sortable": true,
      "facetable": false,
      "retrievable": true,
      "filterable": true,
      "searchAnalyzer":"standard",
      "indexAnalyzer": "custom_analyzer"
    },
    {
      "name": "description",
      "type": "Edm.String",
      "searchable": true,
      "sortable": false,
      "facetable": false,
      "retrievable": true,
      "filterable": true
    },
    {
      "name": "name",
      "type": "Edm.String",
      "searchable": true,
      "sortable": true,
      "facetable": false,
      "retrievable": true,
      "filterable": true
    },
    {
      "name": "entity_type",
      "type": "Edm.String",
      "searchable": true,
      "sortable": true,
      "facetable": true,
      "retrievable": true,
      "filterable": true
    },
    {
      "name": "ancestors",
      "type": "Collection(Edm.String)",
      "searchable": false,
      "sortable": false,
      "facetable": false,
      "retrievable": true,
      "filterable": true
    },
    {
      "name": "calendar_id",
      "type": "Edm.String",
      "searchable": false,
      "sortable": false,
      "facetable": false,
      "retrievable": false,
      "filterable": false
    },
    {
      "name": "currency",
      "type": "Edm.String",
      "searchable": false,
      "sortable": false,
      "facetable": false,
      "retrievable": false,
      "filterable": false
    },
    {
      "name": "timezone",
      "type": "Edm.String",
      "searchable": false,
      "sortable": false,
      "facetable": false,
      "retrievable": false,
      "filterable": false
    },
    {
      "name": "active",
      "type": "Edm.Boolean",
      "retrievable": true,
      "facetable": true,
      "filterable": true
    },
    {
      "name": "kpi_collection",
      "type": "Edm.String",
      "searchable": false,
      "sortable": false,
      "facetable": false,
      "retrievable": false,
      "filterable": false
    },
    {
      "name": "rid",
      "type": "Edm.String"
    }
  ],
  "scoringProfiles": [
    {
      "name": "boostEntity",
      "text": {
        "weights": {
          "entity_id": 9,
          "name": 8,
          "description": 1
        }
      }
    }
  ],
  "analyzers": [
    {
      "name": "custom_analyzer",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "tokenizer":"token1",
      "tokenFilters": [
        "lowercase",
        "entityID_stopWords",
        "entityID_edgeNGram"

      ]
    }
  ],
  "tokenizers":[  
   {  
      "name":"token1",  
      "@odata.type":"#Microsoft.Azure.Search.StandardTokenizerV2"
   }
   ],
  "tokenFilters": [
    {
      "name": "entityID_edgeNGram",
      "@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
      "minGram": 1,
      "maxGram": 6
    },
    {
      "name": "entityID_stopWords",
      "@odata.type": "#Microsoft.Azure.Search.StopwordsTokenFilter",
      "stopwords": [
        "store",
        "region",
        "zone",
        "field_org",
        ":"
      ]
    }
  ]
}

如果我执行这个查询:

{ “搜索”:“0001”, "filter": "entity_type eq 'store' ", “选择”:“名称,entity_id,entity_type,描述,事件,祖先”, “计数”:“真”

}

我得到这个结果,这是正确的,因为它与实体 id 之后得分较高的名称匹配。

"@odata.count": 1,
"value": [
    {
        "@search.score": 1.6654625,
        "name": "LensCrafters 0001",
        "entity_id": "store:1",
        "entity_type": "store",
        "description": "2130 Mall Road, Florence, 41042, KY, US",
        "active": true,
        "ancestors": [
            "region:1021",
            "zone:1123",
            "field_org:lenscrafters_na",
            "ROOT"
        ]
    }
]

}

但是如果我运行这个查询

{
  "search": "1",
  "filter": "entity_type eq 'store' ",
  "select":"name,entity_id,entity_type,description,active,ancestors",
  "count": "true"

}

我得到的结果不正确

 {
            "@search.score": 1.4522386,
            "name": "LensCrafters 1622",
            "entity_id": "store:1622",
            "entity_type": "store",
            "description": "31625 Pacific Hwy S, Spc #E-1, Federal Way, 98003-5645, WA, US",
            "active": true,
            "ancestors": [
                "region:1024",
                "zone:1107",
                "field_org:lenscrafters_na",
                "ROOT"
            ]
        },
        {
            "@search.score": 1.3403159,
            "name": "LensCrafters 1178",
            "entity_id": "store:1178",
            "entity_type": "store",
            "description": "1 W FlatIron Crossing Dr #1104, Broomfield, 80021-8881, CO, US",
            "active": true,
            "ancestors": [
                "region:1019",
                "zone:1122",
                "field_org:lenscrafters_na",
                "ROOT"
            ]
        },
        { 
...............

尽管内部评分配置文件entity_is的值为9,但为什么结果不是这样?

 "@odata.count": 1,
    "value": [
        {
            "@search.score": 1.6654625,
            "name": "LensCrafters 0001",
            "entity_id": "store:1",
            "entity_type": "store",
            "description": "2130 Mall Road, Florence, 41042, KY, US",
            "active": true,
            "ancestors": [
                "region:1021",
                "zone:1123",
                "field_org:lenscrafters_na",
                "ROOT"
            ]
        }
    ]
}

这里是评分资料?

"scoringProfiles": [
        {
            "name": "boostEntity",
            "text": {
                "weights": {
                    "entity_id": 9,
                    "name": 8,
                    "description": 1
                }
            },
            "functions": [],
            "functionAggregation": null
        }
    ],.............

最佳答案

您在entity_id字段上使用自定义分析器,该分析器为文本store:1178生成以下标记:1, 11, 117, 1178(您可以测试您的分析仪配置 Analyze API )。这意味着,文档 LensCrafters 1622LensCrafters 1178 与查询以及文档 LensCrafters 0001 匹配 - 它们都具有 1 在实体 ID 中。然而,文档LensCrafters 1622LensCrafters 1178在描述中也匹配1。因此,他们的得分高于 LensCrafters 0001

要详细了解 Azure 搜索中的查询处理和自定义分析器,请阅读:How full text search works in Azure Search .

您想将edgeNGram token 过滤器保留在您的分析链中吗?为什么?

关于azure - 如何在 Azure 搜索中匹配此查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46120826/

相关文章:

azure - 如何将 outlook 项目 (.msg) 文件格式附件加载到配置单元表?

Azure 表存储数据一致性

azure - 如何将诊断数据发送到 Application Insights? (自动化部署)

postgresql - 来自 azure 托管的 postgres 服务器的 pg_dump

jenkins - Jenkins 字符串参数中的空格分隔字符串参数

elasticsearch - 语言分析器无法找到单一结果

java - Alloy - 从 .als 生成 .xml 实例

video - 如何在 YouTube 视频标题前添加关键字?

php - 构建和验证标签系统

在 Elasticsearch 中搜索带空格的名称(文本)