python线程，如何返回多线程代码执行过程中产生的结果

我正在努力学习如何在 Python 中使用线程来保存对象列表。我从这段代码开始:

import threading
import urllib
from tempfile import NamedTemporaryFile

singlelock = threading.Lock() 

class download(threading.Thread):
    def __init__(self, sitecode, lista):
        threading.Thread.__init__(self)
        self.sitecode = sitecode
        self.status = -1

    def run(self):
        url = "http://waterdata.usgs.gov/nwis/monthly?referred_module=sw&site_no="
        url += self.sitecode 
        url += "&PARAmeter_cd=00060&partial_periods=on&format=rdb&submitted_form=parameter_selection_list"
        tmp = NamedTemporaryFile(delete=False)
        urllib.urlretrieve(url, tmp.name)
        print "loaded Monthly data for sitecode : ",  self.sitecode 
        lista.append(tmp.name)
        print lista

sitecodelist = ["01046500", "01018500", "01010500", "01034500", "01059000", "01066000", "01100000"]
lista = []


for k in sitecodelist:
    get_data = download(k,lista)
    get_data.start()

它只是打印出线程执行期间生成的列表，而我试图返回它。

尝试阅读文档，我正在研究如何使用 threading.Lock() 及其方法 acquire() 和 release() 这似乎是我的问题的解决方案...... 但我真的很难理解如何在我的示例代码中实现它。

非常感谢任何提示!

最佳答案

首先，我们都应该快速回顾一下什么是线程 http://en.wikipedia.org/wiki/Thread_%28computer_science%29 .

好的，所以线程共享内存。所以这应该很容易!这也是线程的好处和坏处，它很容易也很危险! (对于操作系统也是轻量级的)。

现在，如果将 python 与 cpython 结合使用，您应该熟悉全局解释器锁:

http://docs.python.org/glossary.html#term-global-interpreter-lock

此外，来自 http://docs.python.org/library/threading.html :

CPython implementation detail: Due to the Global Interpreter Lock, in CPython only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better of use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

这是什么意思？如果你的任务不是 IO 线程，那么你不会从操作系统中获得任何东西，因为任何时候你使用 python 代码做任何事情，只有一个线程能够做任何事情，因为它有全局锁，没有其他线程可以得到它.对于 IO 绑定(bind)任务，操作系统将调度其他线程，因为在等待 IO 完成时将释放全局锁。需要注意的是，您可能会调用不属于 GIL 的代码，在这种情况下，线程也将执行良好(因此引用了上面的“面向性能的库”。)

谢天谢地，python 使管理共享内存成为一项简单的任务，并且已经有关于如何执行此操作的很好的文档，尽管我花了一点时间才找到它。如果您有任何其他问题，请告诉我们。

In [83]: import _threading_local

In [84]: help(_threading_local)
Help on module _threading_local:

NAME
    _threading_local - Thread-local objects.

FILE
    /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_threading_local.py

MODULE DOCS
    http://docs.python.org/library/_threading_local

DESCRIPTION
    (Note that this module provides a Python version of the threading.local
     class.  Depending on the version of Python you're using, there may be a
     faster one available.  You should always import the `local` class from
     `threading`.)

    Thread-local objects support the management of thread-local data.
    If you have data that you want to be local to a thread, simply create
    a thread-local object and use its attributes:

      >>> mydata = local()
      >>> mydata.number = 42
      >>> mydata.number
      42

    You can also access the local-object's dictionary:

      >>> mydata.__dict__
      {'number': 42}
      >>> mydata.__dict__.setdefault('widgets', [])
      []
      >>> mydata.widgets
      []

    What's important about thread-local objects is that their data are
    local to a thread. If we access the data in a different thread:

      >>> log = []
      >>> def f():
      ...     items = mydata.__dict__.items()
      ...     items.sort()
      ...     log.append(items)
      ...     mydata.number = 11
      ...     log.append(mydata.number)

      >>> import threading
      >>> thread = threading.Thread(target=f)
      >>> thread.start()
      >>> thread.join()
      >>> log
      [[], 11]

    we get different data.  Furthermore, changes made in the other thread
    don't affect data seen in this thread:

      >>> mydata.number
      42

    Of course, values you get from a local object, including a __dict__
    attribute, are for whatever thread was current at the time the
    attribute was read.  For that reason, you generally don't want to save
    these values across threads, as they apply only to the thread they
    came from.

    You can create custom local objects by subclassing the local class:

      >>> class MyLocal(local):
      ...     number = 2
      ...     initialized = False
      ...     def __init__(self, **kw):
      ...         if self.initialized:
      ...             raise SystemError('__init__ called too many times')
      ...         self.initialized = True
      ...         self.__dict__.update(kw)
      ...     def squared(self):
      ...         return self.number ** 2

    This can be useful to support default values, methods and
    initialization.  Note that if you define an __init__ method, it will be
    called each time the local object is used in a separate thread.  This
    is necessary to initialize each thread's dictionary.

    Now if we create a local object:

      >>> mydata = MyLocal(color='red')

    Now we have a default number:

      >>> mydata.number
      2

    an initial color:

      >>> mydata.color
      'red'
      >>> del mydata.color

    And a method that operates on the data:

      >>> mydata.squared()
      4

    As before, we can access the data in a separate thread:

      >>> log = []
      >>> thread = threading.Thread(target=f)
      >>> thread.start()
      >>> thread.join()
      >>> log
      [[('color', 'red'), ('initialized', True)], 11]

    without affecting this thread's data:

      >>> mydata.number
      2
      >>> mydata.color
      Traceback (most recent call last):
      ...
      AttributeError: 'MyLocal' object has no attribute 'color'

    Note that subclasses can define slots, but they are not thread
    local. They are shared across threads:

      >>> class MyLocal(local):
      ...     __slots__ = 'number'

      >>> mydata = MyLocal()
      >>> mydata.number = 42
      >>> mydata.color = 'red'

    So, the separate thread:

      >>> thread = threading.Thread(target=f)
      >>> thread.start()
      >>> thread.join()

    affects what we see:

      >>> mydata.number
      11

    >>> del mydata

以防万一...使用您上面的样式的示例。

In [40]: class TestThread(threading.Thread):
    ...:     report = list() #shared across threads
    ...:     def __init__(self):
    ...:         threading.Thread.__init__(self)
    ...:         self.io_bound_variation = random.randint(1,100)
    ...:     def run(self):
    ...:         start = datetime.datetime.now()
    ...:         print '%s - io_bound_variation - %s' % (self.name, self.io_bound_variation)
    ...:         for _ in range(0, self.io_bound_variation):
    ...:             with open(self.name, 'w') as f:
    ...:                 for i in range(10000):
    ...:                     f.write(str(i) + '\n')
    ...:         print '%s - finished' % (self.name)
    ...:         end = datetime.datetime.now()
    ...:         print '%s took %s time' % (self.name, end - start)
    ...:         self.report.append(end - start)
    ...:

然后运行三个线程并输出。

    In [43]: threads = list()
        ...: for i in range(3):
        ...:     t = TestThread()
        ...:     t.start()
        ...:     threads.append(t)
        ...: 
        ...: for thread in threads:
        ...:     thread.join()
        ...:     
        ...: for thread in threads:
        ...:     print thread.report
        ...:     
    Thread-28 - io_bound_variation - 76
    Thread-29 - io_bound_variation - 83
    Thread-30 - io_bound_variation - 80
    Thread-28 - finished
    Thread-28 took 0:00:08.173861 time
    Thread-30 - finished
    Thread-30 took 0:00:08.407255 time
    Thread-29 - finished
    Thread-29 took 0:00:08.491480 time
    [datetime.timedelta(0, 5, 733093), datetime.timedelta(0, 6, 253811), datetime.timedelta(0, 6, 440410), datetime.timedelta(0, 4, 342053), datetime.timedelta(0, 5, 520407), datetime.timedelta(0, 5, 948238), datetime.timedelta(0, 8, 173861), datetime.timedelta(0, 8, 407255), datetime.timedelta(0, 8, 491480)]
    [datetime.timedelta(0, 5, 733093), datetime.timedelta(0, 6, 253811), datetime.timedelta(0, 6, 440410), datetime.timedelta(0, 4, 342053), datetime.timedelta(0, 5, 520407), datetime.timedelta(0, 5, 948238), datetime.timedelta(0, 8, 173861), datetime.timedelta(0, 8, 407255), datetime.timedelta(0, 8, 491480)]
    [datetime.timedelta(0, 5, 733093), datetime.timedelta(0, 6, 253811), datetime.timedelta(0, 6, 440410), datetime.timedelta(0, 4, 342053), datetime.timedelta(0, 5, 520407), datetime.timedelta(0, 5, 948238), datetime.timedelta(0, 8, 173861), datetime.timedelta(0, 8, 407255), datetime.timedelta(0, 8, 491480)]

你可能想知道为什么报告有超过三个元素......那是因为我在我的解释器中运行了上面的 for 循环代码三次。如果我想修复这个“错误”，我需要确保在运行之前将共享变量设置为空列表。

TestThread.report = list()

这说明了为什么线程会变得笨拙。

关于python线程，如何返回多线程代码执行过程中产生的结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8457940/

python线程，如何返回多线程代码执行过程中产生的结果

上一篇：IP 的 Python 日志解析

下一篇：python - GtkLabel 对齐和填充