java - Lucene荧光笔

标签 java lucene lucene-highlighter

Lucene 4.3.1 荧光笔如何工作?我想从文档中打印出搜索结果(作为搜索词和该词后的 8 个词)。我怎样才能使用 Highlighter 类来做到这一点?我已经将完整的 txt、html 和 xml 文档添加到一个文件中,并将它们添加到我的索引中,现在我有了一个搜索公式,我可能会从中添加荧光笔功能:

String index = "index";
String field = "contents";
String queries = null;
int repeat = 1;
boolean raw = true; //not sure what raw really does???
String queryString = null; //keep null, prompt user later for it
int hitsPerPage = 10; //leave it at 10, go from there later

//need to add all files to same directory
index = "C:\\Users\\plib\\Documents\\index";
repeat = 4;


IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(index)));
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_43);

BufferedReader in = null;
if (queries != null) {
  in = new BufferedReader(new InputStreamReader(new FileInputStream(queries), "UTF-8"));
} else {
  in = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
}
QueryParser parser = new QueryParser(Version.LUCENE_43, field, analyzer);
while (true) {
  if (queries == null && queryString == null) {                        // prompt the user
    System.out.println("Enter query. 'quit' = quit: ");
  }

  String line = queryString != null ? queryString : in.readLine();

  if (line == null || line.length() == -1) {
    break;
  }

  line = line.trim();
  if (line.length() == 0 || line.equalsIgnoreCase("quit")) {
    break;
  }

  Query query = parser.parse(line);
  System.out.println("Searching for: " + query.toString(field));

  if (repeat > 0) {                           // repeat & time as benchmark
    Date start = new Date();
    for (int i = 0; i < repeat; i++) {
      searcher.search(query, null, 100);
    }
    Date end = new Date();
    System.out.println("Time: "+(end.getTime()-start.getTime())+"ms");
  }

  doPagingSearch(in, searcher, query, hitsPerPage, raw, queries == null && queryString == null);

  if (queryString != null) {
    break;
  }
}
reader.close();

最佳答案

我也有同样的疑问,最后偶然发现了这个帖子。

http://vnarcher.blogspot.ca/2012/04/highlighting-text-with-lucene.html

关键部分是当您迭代结果时,将调用 getHighlightedField在要突出显示的结果值上。

private String getHighlightedField(Query query, Analyzer analyzer, String fieldName, String fieldValue) throws IOException, InvalidTokenOffsetsException {
    Formatter formatter = new SimpleHTMLFormatter("<span class="\"MatchedText\"">", "</span>");
    QueryScorer queryScorer = new QueryScorer(query);
    Highlighter highlighter = new Highlighter(formatter, queryScorer);
    highlighter.setTextFragmenter(new SimpleSpanFragmenter(queryScorer, Integer.MAX_VALUE));
    highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);
    return highlighter.getBestFragment(this.analyzer, fieldName, fieldValue);
}

在这种情况下,它假设输出将是 html,它只是用 <span> 包裹突出显示的文本。使用 MatchedText 的 css 类.然后你可以定义一个自定义的 css 规则来做任何你想突出显示的事情。

关于java - Lucene荧光笔,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17535514/

相关文章:

elasticsearch - Elasticsearch发布荧光笔返回太多句子

lucene - 如何从 Solr 索引中删除逻辑删除的文档?

Lucene 实体提取

java - 双击 map fragment 的缩放功能

java - 无法打印正确的金额

java - 在 C JNI 库中使用 Java InputStream

java - @Generated Annotation,我们如何使用它?

elasticsearch - ElasticSearch只能添加字段索引,而不能像lucene Field.Store.NO一样保存原始值