我正在致力于将 Lucene 集成到我们基于 Spring-MVC 的项目中,目前除了使用数字搜索之外,它运行良好。
每当我尝试搜索 123Ab
或 123
或任何其中包含数字的内容时,我都不会得到任何搜索结果。
只要我删除这些数字,它就可以正常工作。
有什么建议吗?谢谢。
代码:
public List<Integer> searchLucene(String text, long groupId, boolean type) {
List<Integer> objectIds = new ArrayList<>();
if (text != null) {
//String specialChars = "+ - && || ! ( ) { } [ ] ^ \" ~ * ? : \\ /";
text = text.replace("+", "\\+");
text = text.replace("-", "\\-");
text = text.replace("&&", "\\&&");
text = text.replace("||", "\\||");
text = text.replace("!", "\\!");
text = text.replace("(", "\\(");
text = text.replace(")", "\\)");
text = text.replace("{", "\\}");
text = text.replace("{", "\\}");
text = text.replace("[", "\\[");
text = text.replace("^", "\\^");
// text = text.replace("\"","\\\"");
text = text.replace("~", "\\~");
text = text.replace("*", "\\*");
text = text.replace("?", "\\?");
text = text.replace(":", "\\:");
//text = text.replace("\\","\\\\");
text = text.replace("/", "\\/");
try {
Path path;
//Set system path code
Directory directory = FSDirectory.open(path);
IndexReader indexReader = DirectoryReader.open(directory);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
QueryParser queryParser = new QueryParser("contents", new SimpleAnalyzer());
Query query;
query = queryParser.parse(text+"*");
TopDocs topDocs = indexSearcher.search(query, 50);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
org.apache.lucene.document.Document document = indexSearcher.doc(scoreDoc.doc);
objectIds.add(Integer.valueOf(document.get("id")));
System.out.println("");
System.out.println("id " + document.get("id"));
System.out.println("content " + document.get("contents"));
}
indexSearcher.getIndexReader().close();
directory.close();
return objectIds;
} catch (Exception ignored) {
}
}
return null;
}
索引代码:
@Override
public void saveIndexes(String text, String tagFileName, String filePath, long groupId, boolean type, int objectId) {
try {
//indexing directory
File testDir;
Path path1;
Directory index_dir;
if (type) {
// System path code
Directory directory = org.apache.lucene.store.FSDirectory.open(path);
IndexWriterConfig config = new IndexWriterConfig(new SimpleAnalyzer());
IndexWriter indexWriter = new IndexWriter(directory, config);
org.apache.lucene.document.Document doc = new org.apache.lucene.document.Document();
if (filePath != null) {
File file = new File(filePath); // current directory
doc.add(new TextField("path", file.getPath(), Field.Store.YES));
}
doc.add(new StringField("id", String.valueOf(objectId), Field.Store.YES));
// doc.add(new TextField("id",String.valueOf(objectId),Field.Store.YES));
if (text == null) {
if (filePath != null) {
FileInputStream is = new FileInputStream(filePath);
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder stringBuffer = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
stringBuffer.append(line).append("\n");
}
stringBuffer.append("\n").append(tagFileName);
reader.close();
doc.add(new TextField("contents", stringBuffer.toString(), Field.Store.YES));
}
} else {
text = text + "\n" + tagFileName;
doc.add(new TextField("contents", text, Field.Store.YES));
}
indexWriter.addDocument(doc);
indexWriter.commit();
indexWriter.flush();
indexWriter.close();
directory.close();
} catch (Exception ignored) {
}
}
我尝试过使用和不使用通配符,即 *
。谢谢。
最佳答案
问题出在您的索引代码中。
您的字段contents
是一个TextField
并且您正在使用SimpleAnalyzer
,因此如果您看到SimpleAnalyzer
文档,它说,
An Analyzer that filters LetterTokenizer with LowerCaseFilter
这意味着对于您的字段,如果将其设置为标记化数字,则会将其删除。
现在看看 TextField
代码,这里 TextField
始终被标记,无论它是 TYPE_STORED
还是 TYPE_NOT_STORED
>。
因此,如果您希望索引字母和数字,则需要使用 StringField
而不是 TextField
。
StringField
文档,
A field that is indexed but not tokenized: the entire String value is indexed as a single token. For example this might be used for a 'country' field or an 'id' field, or any field that you intend to use for sorting or access through the field cache.
StringField
永远不会被标记,无论它是 TYPE_STORED
还是 TYPE_NOT_STORED
因此,索引后,数字将从 contents
字段中删除,并且索引时不包含数字,因此您在搜索时不会找到这些模式。
不要使用 QueryParser
进行复杂的搜索,而是首先使用如下查询来验证您的索引条款,
Query wildcardQuery = new WildcardQuery(new Term("contents", searchString));
TopDocs hits = searcher.search(wildcardQuery, 20);
此外,要了解调试是否集中在索引器端或搜索器端,请使用 Luke Tool查看条款是否根据您的需要创建。如果存在术语,您可以关注搜索者代码。
关于java - 贾夫、卢塞恩 : Search with numbers as String not working,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43886635/