java - org.apache.solr.common.SolrException : Bad Request Bad Request request: http://localhost:8080/solr/update? wt=javabin&version=2

标签 java solr hadoop

请大家帮帮我 我正在尝试使用 NUTCH 抓取网站,但它给了我错误“java.io.IOException: Job failed!

我正在运行此命令“bin/nutch solrindex http://<host name>:8080/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/* ”,并且使用 NUTCH 1.5.1 和 SOLR 3.6.1 以及 jdk java-7-openjdk-i386 和 ubuntu 12.04。

在 NUTCH/log 文件夹中存在的 hadoop.log 中显示以下内容:

2012-09-13 12:56:10,524 INFO  solr.SolrIndexer - SolrIndexer: starting at 2012-09-13 12:56:10

2012-09-13 12:56:10,604 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb

2012-09-13 12:56:10,604 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb

2012-09-13 12:56:10,604 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20120910160403

2012-09-13 12:56:10,711 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20120910160448

2012-09-13 12:56:10,715 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20120910160631

2012-09-13 12:56:10,760 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

2012-09-13 12:56:11,212 INFO  plugin.PluginRepository - Plugins: looking in: /home/zapbuild/Nutch/plugins

2012-09-13 12:56:11,310 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]

2012-09-13 12:56:11,310 INFO  plugin.PluginRepository - Registered Plugins:

2012-09-13 12:56:11,310 INFO  plugin.PluginRepository -     the nutch core extension points (nutch-extensionpoints)

2012-09-13 12:56:11,310 INFO  plugin.PluginRepository -     Regex URL Normalizer (urlnormalizer-regex)

2012-09-13 12:56:11,310 INFO  plugin.PluginRepository -     CyberNeko HTML Parser (lib-nekohtml)

2012-09-13 12:56:11,310 INFO  plugin.PluginRepository -     OPIC Scoring Plug-in (scoring-opic)

2012-09-13 12:56:11,310 INFO  plugin.PluginRepository -     Basic URL Normalizer (urlnormalizer-basic)

2012-09-13 12:56:11,310 INFO  plugin.PluginRepository -     Tika Parser Plug-in (parse-tika)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Basic Indexing Filter (index-basic)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Html Parse Plug-in (parse-html)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Anchor Indexing Filter (index-anchor)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     HTTP Framework (lib-http)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Regex URL Filter (urlfilter-regex)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Regex URL Filter Framework (lib-regex-filter)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Pass-through URL Normalizer (urlnormalizer-pass)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Http Protocol Plug-in (protocol-http)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository - Registered Extension-Points:

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Nutch Protocol (org.apache.nutch.protocol.Protocol)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Nutch Segment Merge Filter (org.apache.nutch.segment.SegmentMergeFilter)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Nutch URL Filter (org.apache.nutch.net.URLFilter)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Nutch Content Parser (org.apache.nutch.parse.Parser)

2012-09-13 12:56:11,311 INFO  plugin.PluginRepository -     Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)

2012-09-13 12:56:11,313 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter


2012-09-13 12:56:11,314 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:11,314 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:14,104 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:14,104 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:14,104 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:17,135 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:17,136 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:17,136 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:20,204 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:20,205 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:20,205 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:23,297 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:23,297 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:23,297 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:26,232 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:26,232 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:26,233 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:29,252 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:29,252 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:29,252 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:32,284 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:32,284 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:32,284 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:35,258 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:35,258 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:35,258 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:38,283 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:38,284 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:38,284 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:41,278 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:41,278 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:41,278 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:44,334 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:44,334 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:44,334 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:47,338 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:47,338 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:47,338 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:50,360 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:50,360 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:50,360 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:53,309 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter

2012-09-13 12:56:53,310 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2012-09-13 12:56:53,310 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-09-13 12:56:53,357 INFO  solr.SolrMappingReader - source: content dest: content

2012-09-13 12:56:53,357 INFO  solr.SolrMappingReader - source: title dest: title

2012-09-13 12:56:53,357 INFO  solr.SolrMappingReader - source: host dest: host

2012-09-13 12:56:53,357 INFO  solr.SolrMappingReader - source: segment dest: segment

2012-09-13 12:56:53,357 INFO  solr.SolrMappingReader - source: boost dest: boost

2012-09-13 12:56:53,357 INFO  solr.SolrMappingReader - source: digest dest: digest

2012-09-13 12:56:53,357 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp

2012-09-13 12:56:53,357 INFO  solr.SolrMappingReader - source: url dest: id

2012-09-13 12:56:53,357 INFO  solr.SolrMappingReader - source: url dest: url

2012-09-13 12:56:53,409 INFO  solr.SolrWriter - Indexing 18 documents

2012-09-13 12:56:53,604 WARN  mapred.LocalJobRunner - job_local_0001

org.apache.solr.common.SolrException: Missing solr core name in path

Missing solr core name in path

request: http://<host name>:8983/solr/update?wt=javabin&version=2
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:142)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:466)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:530)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2012-09-13 12:56:53,981 ERROR solr.SolrIndexer - java.io.IOException: Job failed!

在 Solr 中我没有找到任何日志文件。

请帮我解决这个问题,我真的很困惑。

最佳答案

您的日志说明了问题所在: Missing solr core name in path

您的请求的 Solr 核心名称应在 /solr/ 之间和/update?wt=...

类似这样的事情: http://<host name>:8983/solr/<core_name>/update?wt=javabin&version=2

也许您应该将核心名称添加到您的 nutch 命令 URL 中

关于java - org.apache.solr.common.SolrException : Bad Request Bad Request request: http://localhost:8080/solr/update? wt=javabin&version=2,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12369644/

相关文章:

solr - org.apache.solr.common.SolrException : Error loading class 'org. apache.solr.handler.dataimport.DataImportHandler

Solr - 如何在 Solr 6.6 中设置默认运算符?

hadoop - 配置单元中的压缩如何提高查询性能?

windows - 无法从 Windows 10 的 Cygwin 运行 HBASE

java - 简单的 Java 赛车游戏

java - 遍历 avl 树

java - Android 位置管理器返回空指针异常

java - AlphaComposite - alpha 值收敛

Solr日期范围查询

java - 有什么方法可以检查 Hadoop 文件是否已经打开进行写入?