我正在寻找一个可以在几台服务器上分配任务的 Python 库。该任务类似于单机中 subprocess
库可以并行化的任务。
我知道我可以为此目的设置一个 Hadoop 系统。然而,Hadoop 是重量级的。就我而言,我想使用共享网络磁盘进行数据 I/O,而且我不需要任何花哨的故障恢复。在 MapReduce 的术语中,我只需要映射器,不需要聚合器或缩减器。
Python 中有这样的库吗?谢谢!
最佳答案
尝试使用 celery .
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).
关于python - 任何用于并行和分布式任务的 python 库?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17774262/