python - gzip pickle dump 保存多个项目

我在使用 gzip 和 pickle 时遇到一些问题。基本上我有以下代码，我尝试使用 gzip 和 pickle.dump 保存随机数据集

import pickle
import gzip
import torch
import numpy as np

with gzip.open('test.pt', "wb") as f:
    for d in range(50):
        a = np.random.rand(3,2).astype(np.float32)
        aa = torch.from_numpy(a)
        pickle.dump({'name': str(d), 'person': 'test', 'text1':'blah', 
                    'text2': 'blah', 'data': aa}, f)        

with gzip.open('test.pt', "rb") as f:
    data4 = pickle.load(f)
print(data4)

只打印第一个元素，为什么？

我期望上面一行的输出是:

[{'name': '0', 'person': 'test', 'text1': 'blah', 'text2': 'blah',
  'data': tensor([[0.8789, 0.4588],
                  [0.0728, 0.6768],
                  [0.9147, 0.2786]])},
  {'name': '1', 'person': 'test', 'text1': 'blah', 'text2': 'blah',
   'data': tensor([[0.8789, 0.4588],
                   [0.0728, 0.6768],
                   [0.9147, 0.2786]])},
  {'name': '2', 'person': 'test', 'text1': 'blah', 'text2': 'blah',
   'data': tensor([[0.8789, 0.4588],
                   [0.0728, 0.6768],
                   [0.9147, 0.2786]])},
  ...,
  {'name': '49', 'person': 'test', 'text1': 'blah', 'text2': 'blah',
   'data': tensor([[0.8789, 0.4588],
                   [0.0728, 0.6768],
                   [0.9147, 0.2786]])}]

for d in data4:
    print(d)

# prints: name, person, text1, text2, data, why ???**

我的输出是:

{'name': '0', 'person': 'test', 'text1': 'blah', 'text2': 'blah', 'data': tensor([[0.8789, 0.4588],[0.0728, 0.6768],[0.9147, 0.2786]])}
{'name': '1', 'person': 'test', 'text1': 'blah', 'text2': 'blah', 'data': tensor([[0.8789, 0.4588],[0.0728, 0.6768],[0.9147, 0.2786]])}
{'name': '2', 'person': 'test', 'text1': 'blah', 'text2': 'blah', 'data': tensor([[0.8789, 0.4588],[0.0728, 0.6768],[0.9147, 0.2786]])}
...
{'name': '49', 'person': 'test', 'text1': 'blah', 'text2': 'blah', 'data': tensor([[0.8789, 0.4588],[0.0728, 0.6768],[0.9147, 0.2786]])}

当我这样做的时候，

for d in data4:
    print(d['name'])

我得到:

TypeError Traceback (most recent call last)
<ipython-input-25-13370f7fdb11> in <module>
22 
23 for d in data4:
---> 24     print(d['name'])
TypeError: string indices must be integers**

最后我不太明白为什么我无法使用 d['name']

进行访问

非常感谢任何帮助!

最佳答案

Pickle 是一种独立的格式。您不能简单地连接不同的 pickle 对象并让解码器理解结果。如果您想将多个对象写入文件，然后再读取它们，则需要向文件本身添加一些详细信息，以便您知道如何加载各个对象。

实现此目的的一个简单方法是向文件添加长度以了解每个对象有多大:

import pickle
import gzip
import torch
import numpy as np
import struct

with gzip.open('test.pt', "wb") as f:
    for d in range(50):
        a = np.random.rand(3,2).astype(np.float32)
        aa = torch.from_numpy(a)
        temp = pickle.dumps({'name': str(d), 'person': 'test', 'text1':'blah', 
                    'text2': 'blah', 'data': aa})
        f.write(struct.pack("L", len(temp)))
        f.write(temp)

data4 = []
with gzip.open('test.pt', "rb") as f:
    while True:
        data_length = f.read(4)
        if len(data_length) == 0:
            # No more data
            break
        data_length = struct.unpack("L", data_length)[0]
        data4.append(pickle.loads(f.read(data_length)))
print(data4)

或者，更直接地，您可以一次性保存列表:

data = []
for d in range(50):
    a = np.random.rand(3,2).astype(np.float32)
    aa = torch.from_numpy(a)
    data.append({'name': str(d), 'person': 'test', 'text1':'blah', 
                'text2': 'blah', 'data': aa})
with gzip.open('test.pt', "wb") as f:
    pickle.dump(data, f)

data4 = []
with gzip.open('test.pt', "rb") as f:
    data4 = pickle.load(f)
print(data4)

关于python - gzip pickle dump 保存多个项目，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66164479/

python - gzip pickle dump 保存多个项目

上一篇：javascript - Nodemailer SMTP 服务器接收电子邮件但不响应任何内容

下一篇：r - 使用 dplyr 中的 mutate 对 R 中的自定义函数中的分组数据使用数据框和列作为参数