java - 多次打开文件时 Lucene 内存不足

标签 java indexing lucene out-of-memory

我的应用程序每秒收到多个请求,我们有机器人爬行我们的网站。我使用 Lucene 进行索引和搜索。对于站点重新启动时的第一个请求,应用程序将打开 Lucene 索引文件并存储它。因此,从第二个请求开始,它将查看存储的对象。 但问题是,直到文件完全打开并存储为止,都会有多个请求尝试再次打开该文件。 这会导致网站在 5-10 分钟后内存不足。

这是以下错误。

java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.TreeMap.put(Unknown Source)
    at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:61)
    at org.apache.lucene.codecs.lucene42.Lucene42FieldInfosReader.read(Lucene42FieldInfosReader.java:96)
    at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:121)
    at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:56)
    at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62)
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
    at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
    at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)
    at com.webjaguar.web.frontend.LuceneCategery.getLuceneProduct(LuceneCategery.java:166)
    at com.webjaguar.web.frontend.CategoryController.handleRequest(CategoryController.java:1034)
    at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:48)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
    at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:624)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:312)
    at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:116)
    at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:83)
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324)
    at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324)
    at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324)
    at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)

第二个错误

   Exception in thread "Lucene Merge Thread #9" org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space
    at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
    at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
Caused by: java.lang.OutOfMemoryError: Java heap space
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit
    at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
    at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
    at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:981)
    at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:883)
    at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:845)
    at com.webjaguar.thirdparty.lucene.LuceneProductIndexer.reIndex(LuceneProductIndexer.java:750)
    at com.webjaguar.web.quartz.LuceneProductJob.autoIndex(LuceneProductJob.java:90)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:273)
    at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:311)
    at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:113)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:223)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)

此行是错误行

reader = DirectoryReader.open(NIOFSDirectory.open(indexFile));

有没有办法锁定文件直到它被存储。任何改进其实现方式的解决方案

最佳答案

您应该查看 NIOFSDirectoryLockFactory(继承自父 Directory)。 请参阅LockFactory Javadoc for little more informations

除此之外,您的要求对我来说看起来像是 NRT(近实时)用例。如果您希望在短时间内建立索引和搜索,并且索引将连续完成,那么 NRT 实现将是有意义的。我不确定这是否已经是 lucene v4.2 的一个功能。 请参阅Simple NRT tutorial了解更多信息。

关于java - 多次打开文件时 Lucene 内存不足,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51089114/

相关文章:

java - 在 ManagedBean 之间传递数据

java - 生成 BKS keystore 并存储应用程序 key

lucene - 在 Lucene 中组合分析器的最佳实践是什么?

solr - 拆分 SOLR fq 过滤器查询

java - 单个 Android 线程处理多个作业

SQL 索引和性能改进

python - 仅选择多索引 DataFrame 的一个索引

arrays - 检查数组中索引或键的最简单方法?

performance - 一台机器上的多个 Solr 分片会提高性能吗?

java - 如果代码中有任何异常,如何使 Junit 测试用例失败?