尝试在Centos 6.6上使用Nutch 1.9进行爬网。
按照本指南尝试初始化我的第一次爬网时:
http://wiki.apache.org/nutch/NutchTutorial
但是,启动时出现以下异常:
Injector: Converting injected urls to crawl db entries. Injector: java.net.UnknownHostException: Sparky.LITK: Sparky.LITK: Name or service not known at java.net.InetAddress.getLocalHost(InetAddress.java:1473) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:960) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.crawl.Injector.inject(Injector.java:324) at org.apache.nutch.crawl.Injector.run(Injector.java:380) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Injector.main(Injector.java:370) Caused by: java.net.UnknownHostException: Sparky.LITK: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) at java.net.InetAddress.getLocalHost(InetAddress.java:1469) ... 12 more
似乎正在尝试爬网机器自己的主机名(Sparky.LITK),这不是我想要的,我按照教程设置了seed.txt列表,但它停留在这里。
最佳答案
该修补程序很简单,只需将计算机的主机名添加到/ etc / hosts文件中,使其指向回送地址(127.0.0.1)
我将主机条目修改如下:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 Sparky.LITK
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 Sparky.LITK
而且有效!
关于java - 尝试使用nutch进行爬网时出错-自己的本地主机名上的java.net.UnknownHostException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28228946/