我正在努力学习如何在 Python 中使用线程来保存对象列表。 我从这段代码开始:
import threading
import urllib
from tempfile import NamedTemporaryFile
singlelock = threading.Lock()
class download(threading.Thread):
def __init__(self, sitecode, lista):
threading.Thread.__init__(self)
self.sitecode = sitecode
self.status = -1
def run(self):
url = "http://waterdata.usgs.gov/nwis/monthly?referred_module=sw&site_no="
url += self.sitecode
url += "&PARAmeter_cd=00060&partial_periods=on&format=rdb&submitted_form=parameter_selection_list"
tmp = NamedTemporaryFile(delete=False)
urllib.urlretrieve(url, tmp.name)
print "loaded Monthly data for sitecode : ", self.sitecode
lista.append(tmp.name)
print lista
sitecodelist = ["01046500", "01018500", "01010500", "01034500", "01059000", "01066000", "01100000"]
lista = []
for k in sitecodelist:
get_data = download(k,lista)
get_data.start()
它只是打印出线程执行期间生成的列表,而我试图返回它。
尝试阅读文档,我正在研究如何使用 threading.Lock()
及其方法 acquire()
和 release()
这似乎是我的问题的解决方案......
但我真的很难理解如何在我的示例代码中实现它。
非常感谢任何提示!
最佳答案
首先,我们都应该快速回顾一下什么是线程 http://en.wikipedia.org/wiki/Thread_%28computer_science%29 .
好的,所以线程共享内存。所以这应该很容易!这也是线程的好处和坏处,它很容易也很危险! (对于操作系统也是轻量级的)。
现在,如果将 python 与 cpython 结合使用,您应该熟悉全局解释器锁:
http://docs.python.org/glossary.html#term-global-interpreter-lock
此外,来自 http://docs.python.org/library/threading.html :
CPython implementation detail: Due to the Global Interpreter Lock, in CPython only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better of use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
这是什么意思?如果你的任务不是 IO 线程,那么你不会从操作系统中获得任何东西,因为任何时候你使用 python 代码做任何事情,只有一个线程能够做任何事情,因为它有全局锁,没有其他线程可以得到它.对于 IO 绑定(bind)任务,操作系统将调度其他线程,因为在等待 IO 完成时将释放全局锁。需要注意的是,您可能会调用不属于 GIL 的代码,在这种情况下,线程也将执行良好(因此引用了上面的“面向性能的库”。)
谢天谢地,python 使管理共享内存成为一项简单的任务,并且已经有关于如何执行此操作的很好的文档,尽管我花了一点时间才找到它。如果您有任何其他问题,请告诉我们。
In [83]: import _threading_local
In [84]: help(_threading_local)
Help on module _threading_local:
NAME
_threading_local - Thread-local objects.
FILE
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_threading_local.py
MODULE DOCS
http://docs.python.org/library/_threading_local
DESCRIPTION
(Note that this module provides a Python version of the threading.local
class. Depending on the version of Python you're using, there may be a
faster one available. You should always import the `local` class from
`threading`.)
Thread-local objects support the management of thread-local data.
If you have data that you want to be local to a thread, simply create
a thread-local object and use its attributes:
>>> mydata = local()
>>> mydata.number = 42
>>> mydata.number
42
You can also access the local-object's dictionary:
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]
What's important about thread-local objects is that their data are
local to a thread. If we access the data in a different thread:
>>> log = []
>>> def f():
... items = mydata.__dict__.items()
... items.sort()
... log.append(items)
... mydata.number = 11
... log.append(mydata.number)
>>> import threading
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
>>> log
[[], 11]
we get different data. Furthermore, changes made in the other thread
don't affect data seen in this thread:
>>> mydata.number
42
Of course, values you get from a local object, including a __dict__
attribute, are for whatever thread was current at the time the
attribute was read. For that reason, you generally don't want to save
these values across threads, as they apply only to the thread they
came from.
You can create custom local objects by subclassing the local class:
>>> class MyLocal(local):
... number = 2
... initialized = False
... def __init__(self, **kw):
... if self.initialized:
... raise SystemError('__init__ called too many times')
... self.initialized = True
... self.__dict__.update(kw)
... def squared(self):
... return self.number ** 2
This can be useful to support default values, methods and
initialization. Note that if you define an __init__ method, it will be
called each time the local object is used in a separate thread. This
is necessary to initialize each thread's dictionary.
Now if we create a local object:
>>> mydata = MyLocal(color='red')
Now we have a default number:
>>> mydata.number
2
an initial color:
>>> mydata.color
'red'
>>> del mydata.color
And a method that operates on the data:
>>> mydata.squared()
4
As before, we can access the data in a separate thread:
>>> log = []
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
>>> log
[[('color', 'red'), ('initialized', True)], 11]
without affecting this thread's data:
>>> mydata.number
2
>>> mydata.color
Traceback (most recent call last):
...
AttributeError: 'MyLocal' object has no attribute 'color'
Note that subclasses can define slots, but they are not thread
local. They are shared across threads:
>>> class MyLocal(local):
... __slots__ = 'number'
>>> mydata = MyLocal()
>>> mydata.number = 42
>>> mydata.color = 'red'
So, the separate thread:
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
affects what we see:
>>> mydata.number
11
>>> del mydata
以防万一...使用您上面的样式的示例。
In [40]: class TestThread(threading.Thread):
...: report = list() #shared across threads
...: def __init__(self):
...: threading.Thread.__init__(self)
...: self.io_bound_variation = random.randint(1,100)
...: def run(self):
...: start = datetime.datetime.now()
...: print '%s - io_bound_variation - %s' % (self.name, self.io_bound_variation)
...: for _ in range(0, self.io_bound_variation):
...: with open(self.name, 'w') as f:
...: for i in range(10000):
...: f.write(str(i) + '\n')
...: print '%s - finished' % (self.name)
...: end = datetime.datetime.now()
...: print '%s took %s time' % (self.name, end - start)
...: self.report.append(end - start)
...:
然后运行三个线程并输出。
In [43]: threads = list()
...: for i in range(3):
...: t = TestThread()
...: t.start()
...: threads.append(t)
...:
...: for thread in threads:
...: thread.join()
...:
...: for thread in threads:
...: print thread.report
...:
Thread-28 - io_bound_variation - 76
Thread-29 - io_bound_variation - 83
Thread-30 - io_bound_variation - 80
Thread-28 - finished
Thread-28 took 0:00:08.173861 time
Thread-30 - finished
Thread-30 took 0:00:08.407255 time
Thread-29 - finished
Thread-29 took 0:00:08.491480 time
[datetime.timedelta(0, 5, 733093), datetime.timedelta(0, 6, 253811), datetime.timedelta(0, 6, 440410), datetime.timedelta(0, 4, 342053), datetime.timedelta(0, 5, 520407), datetime.timedelta(0, 5, 948238), datetime.timedelta(0, 8, 173861), datetime.timedelta(0, 8, 407255), datetime.timedelta(0, 8, 491480)]
[datetime.timedelta(0, 5, 733093), datetime.timedelta(0, 6, 253811), datetime.timedelta(0, 6, 440410), datetime.timedelta(0, 4, 342053), datetime.timedelta(0, 5, 520407), datetime.timedelta(0, 5, 948238), datetime.timedelta(0, 8, 173861), datetime.timedelta(0, 8, 407255), datetime.timedelta(0, 8, 491480)]
[datetime.timedelta(0, 5, 733093), datetime.timedelta(0, 6, 253811), datetime.timedelta(0, 6, 440410), datetime.timedelta(0, 4, 342053), datetime.timedelta(0, 5, 520407), datetime.timedelta(0, 5, 948238), datetime.timedelta(0, 8, 173861), datetime.timedelta(0, 8, 407255), datetime.timedelta(0, 8, 491480)]
你可能想知道为什么报告有超过三个元素......那是因为我在我的解释器中运行了上面的 for 循环代码三次。如果我想修复这个“错误”,我需要确保在运行之前将共享变量设置为空列表。
TestThread.report = list()
这说明了为什么线程会变得笨拙。
关于python线程,如何返回多线程代码执行过程中产生的结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8457940/