java - Java多线程网络爬虫中控制线程数量和对象访问

我构建了一个网络爬虫，但它是单线程的。现在我正在扩展它以处理多个线程。我无法理解以下内容:

我应该创建多少个线程？它应该是一个固定数字还是一个根据保存 URI 的队列长度而变化的动态数字？ (还要考虑可用内存)
我已经通过 Runnable Interface 为线程创建了一个新类，并且我想要每个线程的 run访问我在主类中创建的对象的方法，该对象正在调用 thread.start() 。我应该如何从每个线程访问这个对象？

我正在使用 NetBeans。

最佳答案

对于第一个问题，我想在您的情况下最好使用动态调整线程池，例如:

ExecutorService exec = Executors.newCachedThreadPool();

Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources.

对于第二个问题，您可以创建一个构造函数并以这种方式传递对象:

class ThreadTask implements Runnable {
     private Object obj;

     public ThreadTask(Object obj) {
         this.obj = obj;
     }

     public void run() {
     }
}

public static void main(String[] args) {
     Object obj = new Object();
     exec.submit(new ThreadTask(obj));
}

关于java - Java多线程网络爬虫中控制线程数量和对象访问，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12721900/

java - Java多线程网络爬虫中控制线程数量和对象访问

上一篇：ASP.Net ScriptManager 在 Machine.Config 中全局设置 ScriptMode

下一篇：SQL-Server 用 NULL 值替换空单元格