java - lucene 4.10.2中生成多个CFS文件

我正在尝试使用 lucene 4.10.2 索引大约 612 条记录。它正在索引目录中创建大量 CFS 文件。创建了大约 626 个 CFS 文件。索引需要更多时间。所有 CFS 文件最大为 3kb。

环境:java 8，窗口 7

Directory dir = FSDirectory.open(file);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_2, new ClassicAnalyzer());
if(bufferSizeMB != 0 && bufferSizeMB != -1){
    config.setRAMBufferSizeMB(bufferSizeMB);
}  else {
    config.setRAMBufferSizeMB(DEFAULT_RAM_BUFFER_SIZE_MB);
}      
config.setMaxBufferedDocs(1000);
config.setMaxBufferedDeleteTerms(1000);
config.setMergePolicy(new LogDocMergePolicy());
IndexWriter iwriter = new IndexWriter(dir, config);
iwriter.getConfig().setMaxBufferedDeleteTerms(1000);
iwriter.getConfig().setMaxBufferedDocs(1000);
iwriter.getConfig().setRAMBufferSizeMB(bufferSizeMB)

http://lucene.472066.n3.nabble.com/Multiple-CFS-files-are-generated-in-lucene-4-10-2-td4176336.html

最佳答案

来自change文档，

  LUCENE-4462: DocumentsWriter now flushes deletes, segment infos and builds
  CFS files if necessary during segment flush and not during publishing. The latter
  was a single threaded process while now all IO and CPU heavy computation is done
  concurrently in DocumentsWriterPerThread.

使用分段刷新，会根据您的合并策略触发合并。理想情况下，如果索引正常结束并且编写器关闭，则应该只保留一个 cfs 文件。

这就是我在应用程序中观察到的情况。

针对评论进行更新

我最近从 2.x 迁移到 4.10.2。

引用自索引编写者 4.10.2 documentation .

Commits all pending changes (added & deleted documents, segment merges, added indexes, 
etc.) to the index, and syncs all referenced index files, such that a reader will see 
the changes and the index updates will survive an OS or machine crash or power loss. 
Note that this does not wait for any running background merges to finish. This may
be a costly operation, so you should test the cost in your application and do it only
when really necessary.

您可以做的是使用一个索引写入器并使用它添加所有记录，而无需每次都调用提交。最后，当添加所有记录时，只需调用 indexwriter.close() 即可处理合并和提交过程。

关于java - lucene 4.10.2中生成多个CFS文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27685437/

java - lucene 4.10.2中生成多个CFS文件

上一篇：java - 如何检查项目是否包含启动快捷方式的特定文件

下一篇：java - 地址重用在新的 Java 运行时环境中不起作用