我正在使用 pickle 通过转储根来保存对象图。当我加载根时,它具有所有实例变量和连接的对象节点。但是,我将所有节点保存在字典类型的类变量中。类变量在保存之前是满的,但在我解开数据之后它是空的。
这是我正在使用的类:
class Page():
__crawled = {}
def __init__(self, title = '', link = '', relatedURLs = []):
self.__title = title
self.__link = link
self.__relatedURLs = relatedURLs
self.__related = []
@property
def relatedURLs(self):
return self.__relatedURLs
@property
def title(self):
return self.__title
@property
def related(self):
return self.__related
@property
def crawled(self):
return self.__crawled
def crawl(self,url):
if url not in self.__crawled:
webpage = urlopen(url).read()
patFinderTitle = re.compile('<title>(.*)</title>')
patFinderLink = re.compile('<link rel="canonical" href="([^"]*)" />')
patFinderRelated = re.compile('<li><a href="([^"]*)"')
findPatTitle = re.findall(patFinderTitle, webpage)
findPatLink = re.findall(patFinderLink, webpage)
findPatRelated = re.findall(patFinderRelated, webpage)
newPage = Page(findPatTitle,findPatLink,findPatRelated)
self.__related.append(newPage)
self.__crawled[url] = newPage
else:
self.__related.append(self.__crawled[url])
def crawlRelated(self):
for link in self.__relatedURLs:
self.crawl(link)
我是这样保存的:
with open('medTwiceGraph.dat','w') as outf:
pickle.dump(root,outf)
然后我这样加载它:
def loadGraph(filename): #returns root
with open(filename,'r') as inf:
return pickle.load(inf)
root = loadGraph('medTwiceGraph.dat')
加载除类变量 __crawled 之外的所有数据。
我做错了什么?
最佳答案
Python 并不真正 pickle 类对象。它只是保存他们的名字和在哪里可以找到他们。来自 pickle
的文档:
Similarly, classes are pickled by named reference, so the same restrictions in the unpickling environment apply. Note that none of the class’s code or data is pickled, so in the following example the class attribute
attr
is not restored in the unpickling environment:class Foo: attr = 'a class attr' picklestring = pickle.dumps(Foo)
These restrictions are why picklable functions and classes must be defined in the top level of a module.
Similarly, when class instances are pickled, their class’s code and data are not pickled along with them. Only the instance data are pickled. This is done on purpose, so you can fix bugs in a class or add methods to the class and still load objects that were created with an earlier version of the class. If you plan to have long-lived objects that will see many versions of a class, it may be worthwhile to put a version number in the objects so that suitable conversions can be made by the class’s
__setstate__()
method.
在您的示例中,您可以解决将 __crawled
更改为实例属性或全局变量的问题。
关于python - python 2.7 中的 pickle.dump 不保存类变量字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16637464/