hadoop - 同时处理与主机的多个连接

如何同时处理与主机的多个连接？

最佳答案

来自 nutch-default.xml:

<property>
  <name>fetcher.threads.fetch</name>
  <value>10</value>
  <description>The number of FetcherThreads the fetcher should use.
    This is also determines the maximum number of requests that are 
    made at once (each FetcherThread handles one connection).</description>
</property>

<property>
  <name>fetcher.threads.per.host</name>
  <value>1</value>
  <description>This number is the maximum number of threads that
    should be allowed to access a host at one time.</description>
</property>

如上所述，连接数最多等于线程数。第一个属性控制连接总数，第二个属性控制每个主机的连接数 - 这是您需要设置的。

关于hadoop - 同时处理与主机的多个连接，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2460433/

上一篇：hadoop - 有没有办法为 Hadoop 中的推测执行配置超时？

下一篇：jar - 如何在我的 map/reduce 作业中使用更新版本的 hadoop/lib jar？

hadoop - 为什么 sqoop 将导入中的 DECIMAL 类型从 Teradata 映射到 DOUBLE？

performance - Nutch 1.12和Elasticsearch 1.4.1性能需求

youtube - 种子网址可抓取整个YouTube

linux - 如何使用 shell 脚本迭代 HDFS 目录中的所有文件？

hadoop - 如何读取Hadoop Map中间文件file.out

elasticsearch - 自定义坚果索引编写器以将值映射到弹性docin multielvel(例如JSON)

获取时的 Apache Nutch 2.3.1 扩展点

java - 爬虫引擎架构——Java/Perl整合

hadoop - 无法使用 Parquet Storer 存储 Pig 关系