java - Lucene Java 打开太多文件。我正确使用 IndexWriter 了吗？

我的 Lucene Java 实现占用了太多文件。我按照 Lucene Wiki 中有关打开文件过多的说明进行操作，但这只会减缓问题的速度。这是我将对象(PTicket)添加到索引的代码:

//This gets called when the bean is instantiated
public void initializeIndex() {
    analyzer = new WhitespaceAnalyzer(Version.LUCENE_32);
    config = new IndexWriterConfig(Version.LUCENE_32, analyzer);

}


public void addAllToIndex(Collection<PTicket> records) {  
    IndexWriter indexWriter = null;
    config = new IndexWriterConfig(Version.LUCENE_32, analyzer);

    try{
        indexWriter = new IndexWriter(directory, config);
        for(PTicket record : records) {
            Document doc = new Document();
            StringBuffer documentText = new StringBuffer();
            doc.add(new Field("_id", record.getIdAsString(), Field.Store.YES, Field.Index.ANALYZED));
            doc.add(new Field("_type", record.getType(), Field.Store.YES, Field.Index.ANALYZED));

            for(String key : record.getProps().keySet()) {
                List<String> vals = record.getProps().get(key);

                for(String val : vals) {
                    addToDocument(doc, key, val);
                    documentText.append(val).append(" ");
                }
            }
            addToDocument(doc, DOC_TEXT, documentText.toString());        
            indexWriter.addDocument(doc);    
        }

        indexWriter.optimize();
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        cleanup(indexWriter);
    }
}

private void cleanup(IndexWriter iw) {
    if(iw == null) {
        return;
    }

    try{
        iw.close();
    } catch (IOException ioe) {
        logger.error("Error trying to close index writer");
        logger.error("{}", ioe.getClass().getName());
        logger.error("{}", ioe.getMessage());
    }
}

private void addToDocument(Document doc, String field, String value) {
    doc.add(new Field(field, value, Field.Store.YES, Field.Index.ANALYZED));
}

编辑添加搜索代码

public Set<Object> searchIndex(AthenaSearch search) {  

    try {
        Query q = new QueryParser(Version.LUCENE_32, DOC_TEXT, analyzer).parse(query);

        //search is actually instantiated in initialization.  Lucene recommends this.
        //IndexSearcher searcher = new IndexSearcher(directory, true);
        TopDocs topDocs = searcher.search(q, numResults);
        ScoreDoc[] hits = topDocs.scoreDocs;
        for(int i=start;i<hits.length;++i) {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            ids.add(d.get("_id"));
        }
        return ids;
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    }
}

此代码位于网络应用程序中。

1) 这是使用 IndexWriter 的建议方法(在每次添加到索引时实例化一个新的 IndexWriter)吗？

2)我读到提高 ulimit 会有所帮助，但这似乎只是一个创可贴，无法解决实际问题。

3) 问题是否出在 IndexSearcher 上？

最佳答案

1) Is this the advised way to use IndexWriter (instantiating a new one on each add to index)?

我建议不，有constructors ，它将检查包含索引的目录中是否存在或创建一个新的编写器。如果重用索引编写器，问题 2 就可以解决。

编辑:

好吧，Lucene 3.2 中似乎最多，但有一个构造函数已被弃用，因此 Indexwriter 的结果可以通过使用 Enum IndexWriterConfig.OpenMode 来实现，其值为 CREATE_OR_APPEND .

另外，打开新的编写器并关闭每个文档添加效率不高，我建议重用，如果你想加快索引速度，请设置 setRamBufferSize默认值是16MB，所以通过试错法来实现

来自文档:

Note that you can open an index with create=true even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open.

也复用了IndexSearcher，我看不到搜索的代码，但是Indexsearcher是线程安全的，可以用作Readonly还有

我还建议您在 writer 上使用 MergeFactor，这不是必需的，但有助于限制倒排索引文件的创建，通过试错法来实现

关于java - Lucene Java 打开太多文件。我正确使用 IndexWriter 了吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6403606/

java - Lucene Java 打开太多文件。我正确使用 IndexWriter 了吗？

上一篇：java - 将行的最后一个单词设置为 jtextarea 中的下一行

下一篇：java - 使用 Arquillian 测试有状态 session Bean (arq-jbossas-remote)