lucene - 使用 Lucene 的同义词

使用 Lucene 处理同义词(短语)的最佳方法是什么？
特别是，当我需要执行诸如 :a OR b OR c NOT d 之类的查询时

在索引时向每个文档添加一个名为“同义词”的新字段如何？
该字段的值将包含所有同义词的列表。仅当该文档具有任何同义词时，才会将其添加到文档中。

然后，我将执行一个“OR”搜索查询，该查询将在该字段以及其他字段中查找搜索关键字。

这种方法是否适用于任何类型的查询？

供引用，
我的应用程序中的同义词完全是自定义的，而不是来自英语词典......即。 “全局金融领导者”也可以指“顶级投资银行”或“世界500强金融公司”等。

请建议。

谢谢。

最佳答案

Lucene 项目有一个名为“wordnet”的贡献。根据 its documentation :

This package uses synonyms defined by WordNet to build a Lucene index storing them, which in turn can be used for query expansion. You normally run Syns2Index once to build the query index/"database", and then call SynExpand.expand(...) to expand a query.

它包括它的作用示例:

If you pass in the query "big dog" then it prints out:

Query: big adult^0.9 bad^0.9 bighearted^0.9 boastful^0.9 boastfully^0.9 bounteous^0.9 bountiful^0.9 braggy^0.9 crowing^0.9 freehanded^0.9 giving^0.9 grown^0.9 grownup^0.9 handsome^0.9 large^0.9 liberal^0.9 magnanimous^0.9 momentous^0.9 openhanded^0.9 prominent^0.9 swelled^0.9 vainglorious^0.9 vauntingly^0.9 dog andiron^0.9 blackguard^0.9 bounder^0.9 cad^0.9 chase^0.9 click^0.9 detent^0.9 dogtooth^0.9 firedog^0.9 frank^0.9 frankfurter^0.9 frump^0.9 heel^0.9 hotdog^0.9 hound^0.9 pawl^0.9 tag^0.9 tail^0.9 track^0.9 trail^0.9 weenie^0.9 wiener^0.9 wienerwurst^0.9

您会看到原始单词(“big”和“dog”)没有附加权重。但是，同义词具有您可以自己配置的权重 (0.9)。

它与 Lucene 的标准发行版捆绑在一起，位于“contrib”目录中。

关于lucene - 使用 Lucene 的同义词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1248039/

lucene - 使用 Lucene 的同义词

上一篇：f# - 在不同的泛型实例中实现相同的接口(interface)

下一篇：tfs - 如何链接 TFS 构建？