java - Lucene 3.5 搜索时不支持中文、俄语、韩语

我正在使用 Lucene 3.5 标准分析器进行索引和搜索。它适用于除中文、日文和韩文以外的所有语言。我尝试过中日韩分析仪和中文分析仪。但仍然不起作用。索引已正确创建。我们已经用 Luke 工具验证了这一点。但无法使用 Luke 工具和使用分析器的代码搜索上述语言单词。任何解决方案。

伊拉克航空公司               

+name:伊拉克航空公司~0.9     This  is the lucene query generated by the analyzer for this chinese word. But not returning result. But other languages and its corresponding query is returning results

最佳答案

对于中文来说，有很多有用的第三方分析器，例如:

mmseg4j
IK-analyzer
ansj_seg
imdict-中文分析器

我推荐IK分析器，例如: 将其添加到您的依赖项中:

    <dependency>
        <groupId>com.janeluo</groupId>
        <artifactId>ikanalyzer</artifactId>
        <version>2012_u6</version>
    </dependency>

示例代码:

public class LuenceFirst {
    public static void main(String[] args) throws IOException {
        Analyzer analyzer = new IKAnalyzer(); 
        TokenStream tokenStream = analyzer.tokenStream("", "伊拉克航空公司");

        CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
        OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
        tokenStream.reset();
        while (tokenStream.incrementToken()) {
            System.out.println("start→" + offsetAttribute.startOffset());
            System.out.println(charTermAttribute);
            System.out.println("end→" + offsetAttribute.endOffset()); 
        }
        tokenStream.close();
    }
}

输出为: 开始→0

伊拉克

end→3

start→3

航空公司

end→7

start→3

航空

end→5

start→5

公司

end→7

对于日语:

关于java - Lucene 3.5 搜索时不支持中文、俄语、韩语，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48055694/

java - Lucene 3.5 搜索时不支持中文、俄语、韩语

上一篇：r - 如何根据 r 数据框中的多列条件创建基于排名的列

下一篇：angular - 延迟服务调用直到循环结束