我正在尝试对大约 40Gb 的英文维基百科进行索引,但它不起作用。我已按照 http://wiki.apache.org/solr/DataImportHandler#Configuring_DataSources 上的教程进行操作以及其他相关的 Stackoverflow 问题,例如 Indexing wikipedia with solr和 Indexing wikipedia dump with solr 。
我能够使用教程中解释的配置导入维基百科(简单英语)、大约 15 万个文档和葡萄牙语维基百科(超过 100 万个文档)。当我尝试对英文维基百科(超过 800 万个文档)建立索引时,问题就出现了。它给出以下错误:
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457)
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:410)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:539)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408)
... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.ParallelPostingsArray.<init>(ParallelPostingsArray.java:34)
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:254)
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:279)
at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:307)
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:324)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)
at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:165)
at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:217)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:569)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:705)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:504)
... 6 more
我使用的是配备 4Gb RAM 和超过 120Gb 硬盘可用空间的 MacBook Pro。我已经尝试更改 solrconfig.xml 中的 256,但到目前为止没有成功。
请问有人可以帮助我吗?
已编辑
以防万一,如果有人遇到同样的问题,我使用了 Cheffe 建议的命令 java Xmx1g -jar star.jar
来解决我的问题。
最佳答案
您的 Java VM 内存不足。给它更多的内存。就像这个问题 Increase heap size in Java 中所解释的那样
java -Xmx1024m myprogram
有关 Xmx
参数的更多详细信息可以是 found in the docs ,只需搜索-Xmxsize
Specifies the maximum size (in bytes) of the memory allocation pool in bytes. This value must be a multiple of 1024 and greater than 2 MB. Append the letter k or K to indicate kilobytes, m or M to indicate megabytes, g or G to indicate gigabytes. The default value is chosen at runtime based on system configuration. For server deployments, -Xms and -Xmx are often set to the same value. For more information, see Garbage Collector Ergonomics at http://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc-ergonomics.html
The following examples show how to set the maximum allowed size of allocated memory to 80 MB using various units:
- Xmx83886080
- Xmx81920k
- Xmx80m
The -Xmx option is equivalent to -XX:MaxHeapSize.
关于solr - 使用 Solr 索引维基百科不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22596726/