我的应用程序每秒收到多个请求,我们有机器人爬行我们的网站。我使用 Lucene 进行索引和搜索。对于站点重新启动时的第一个请求,应用程序将打开 Lucene 索引文件并存储它。因此,从第二个请求开始,它将查看存储的对象。 但问题是,直到文件完全打开并存储为止,都会有多个请求尝试再次打开该文件。 这会导致网站在 5-10 分钟后内存不足。
这是以下错误。
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.TreeMap.put(Unknown Source)
at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:61)
at org.apache.lucene.codecs.lucene42.Lucene42FieldInfosReader.read(Lucene42FieldInfosReader.java:96)
at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:121)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:56)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)
at com.webjaguar.web.frontend.LuceneCategery.getLuceneProduct(LuceneCategery.java:166)
at com.webjaguar.web.frontend.CategoryController.handleRequest(CategoryController.java:1034)
at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:48)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:624)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:312)
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:116)
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:83)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324)
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324)
at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324)
at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
第二个错误
Exception in thread "Lucene Merge Thread #9" org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
Caused by: java.lang.OutOfMemoryError: Java heap space
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit
at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:981)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:883)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:845)
at com.webjaguar.thirdparty.lucene.LuceneProductIndexer.reIndex(LuceneProductIndexer.java:750)
at com.webjaguar.web.quartz.LuceneProductJob.autoIndex(LuceneProductJob.java:90)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:273)
at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:311)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:113)
at org.quartz.core.JobRunShell.run(JobRunShell.java:223)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
此行是错误行
reader = DirectoryReader.open(NIOFSDirectory.open(indexFile));
有没有办法锁定文件直到它被存储。任何改进其实现方式的解决方案
最佳答案
您应该查看 NIOFSDirectory
的 LockFactory
(继承自父 Directory
)。
请参阅LockFactory Javadoc for little more informations
除此之外,您的要求对我来说看起来像是 NRT(近实时)用例。如果您希望在短时间内建立索引和搜索,并且索引将连续完成,那么 NRT 实现将是有意义的。我不确定这是否已经是 lucene v4.2 的一个功能。 请参阅Simple NRT tutorial了解更多信息。
关于java - 多次打开文件时 Lucene 内存不足,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51089114/