如果我在值的末尾添加通配符,为什么我的 Lucene 4.10 只能匹配字段?
我有一个用 KeywordAnalyzer 定义的名为 acousid 的字段
ACOUSTID("acoustid",IndexFieldTypes.TEXT_NOT_STORED_ANALYZED_NO_NORMS, new KeywordAnalyzer()),
如果我这样运行我的查询,我将找不到匹配项
query=acoustid:ae8f4538-9971-41b3-a6d0-bbca1c13e855
但是如果添加通配符我会得到正确的匹配
query=acoustid:ae8f4538-9971-41b3-a6d0-bbca1c13e855*
请注意,查询在到达 Lucene 之前针对 Lucene 进行了转义
我有另一个字段 (reid),它也使用 KeywordAnalyzer 存储 guid 效果很好。
query=reid:425cf29a-1490-43ab-abfa-7b17a2cec351
我无法理解这一点,因为我看不出在值之后还有任何其他数据,以及我的单元测试,例如
@Test
public void testFindReleaseByAcoustId() throws Exception {
Results res = ss.search("acoustid:1d9e8ed6-3893-4d3b-aa7d-6cd79609e389", 0, 10);
assertEquals(1, res.getTotalHits());
assertEquals("1d9e8ed6-3893-4d3b-aa7d-6cd79609e386", getReleaseId(res.results.get(0).getDoc()));
}
它工作正常。
我的下一步应该是什么?
更新
我记得我添加了一个选项来解释查询,所以这是带通配符的
Query:+acoustid:ae8f4538-9971-41b3-a6d0-bbca1c13e855* +src:1
0:Score:100.0
ba938fab-22b1-42ba-9bda-47261bc0569d:Now That's What I Call the 90s
2.954172 = (MATCH) sum of:
0.3385043 = (MATCH) ConstantScore(acoustid:ae8f4538-9971-41b3-a6d0-bbca1c13e855), product of:
1.0 = boost
0.3385043 = queryNorm
2.6156676 = (MATCH) weight(src:1 in 9) [DefaultSimilarity], result of:
2.6156676 = score(doc=9,freq=1.0 = termFreq=1.0 ), product of:
0.9409648 = queryWeight, product of:
2.779772 = idf(docFreq=2052700, maxDocs=12169449)
0.3385043 = queryNorm
2.779772 = fieldWeight in 9, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
2.779772 = idf(docFreq=2052700, maxDocs=12169449)
1.0 = fieldNorm(doc=9)
这是没有
Query:+(acoustid:ae8f4538 acoustid:9971 acoustid:41b3 acoustid:a6d0 acoustid:bbca1c13e855) +src:1
很明显,“-”连字符会导致分解条款的问题。
我对类似 reid
的工作查询给出了
查询:+reid:c3c0e462-1606-40dc-9667-1b26b9fb44c5 +src:1
0:Score:100.0
c3c0e462-1606-40dc-9667-1b26b9fb44c5:Liquid Tension Experiment
16.852135 = (MATCH) sum of:
16.39361 = (MATCH) weight(reid:c3c0e462-1606-40dc-9667-1b26b9fb44c5 in 552496) [DefaultSimilarity], result of:
16.39361 = score(doc=552496,freq=1.0 = termFreq=1.0 ), product of:
0.9863018 = queryWeight, product of:
16.621292 = idf(docFreq=1, maxDocs=12169449)
0.059339657 = queryNorm
16.621292 = fieldWeight in 552496, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
16.621292 = idf(docFreq=1, maxDocs=12169449)
1.0 = fieldNorm(doc=552496)
0.4585254 = (MATCH) weight(src:1 in 552496) [DefaultSimilarity], result of:
0.4585254 = score(doc=552496,freq=1.0 = termFreq=1.0 ), product of:
0.16495071 = queryWeight, product of:
2.779772 = idf(docFreq=2052700, maxDocs=12169449)
0.059339657 = queryNorm
2.779772 = fieldWeight in 552496, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
2.779772 = idf(docFreq=2052700, maxDocs=12169449)
1.0 = fieldNorm(doc=552496)
啊,我可能已经找到问题了,但必须重建索引才能检查
reid 被定义为使用 IndexFieldTypes.TEXT_STORED_NOT_ANALYZED_NO_NORMS acousid 被定义为使用 IndexFieldTypes.TEXT_NOT_STORED_ANALYZED_NO_NORMS
最佳答案
尝试以下操作:
WildcardQuery q = new WildcardQuery(new Term("acoustid", "ae8f4538-9971-41b3-a6d0-bbca1c13e855*");
q.setRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_REWRITE);
Query rewritten = searcher.rewrite(q);
并查看重写的查询(通过 toString()
或调试器)。
重写
将由单个词条查询子句进行的 boolean 查询反射(reflect)真实的索引词条。
UPD:在 Lucene4 中间线应该是
q.setRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
关于java - 如果我将通配符添加到值的末尾,为什么我的 Lucene 只匹配字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51855869/