如何在 Python 中从 .dat
文件读取并存储 8D 数组?我的二进制文件如下所示。我希望每个字符串都是一行
['r 11 1602 24 1622 0\n', 'i 26 1602 36 1631 0\n',
'v 37 1602 57 1621 0\n', 'e 59 1602 76 1622 0\n',
'r 77 1602 91 1622 1\n', 'h 106 1602 127 1631 0\n',
'e 127 1602 144 1622 1\n', 'h 160 1602 181 1631 0\n',
'e 181 1602 198 1622 0\n', 'a 200 1602 218 1622 0\n',
'r 218 1602 232 1622 0\n', 'd 234 1602 254 1631 1\n',
't 268 1602 280 1627 0\n', 'h 280 1602 301 1631 0\n',
'e 302 1602 319 1622 1\n', 'd 335 1602 355 1631 0\n']
当我尝试这个时:
file1 = open('data/train1.dat', 'rb')
train1_dat = np.loadtxt(file1.readlines(), delimiter=',')
print train1_dat
我收到这个错误
ValueError: could not convert string to float: r 11 1602 24 1622 0
最佳答案
假设您的 .dat 文件与您的问题完全相同,我们首先创建一个模仿此格式的数据字符串。我们将其读入数据字符串,然后将其转换成适合加载到 numpy 中的格式
from StringIO import StringIO
d = StringIO("""['r 11 1602 24 1622 0\n', 'i 26 1602 36 1631 0\n',
'v 37 1602 57 1621 0\n', 'e 59 1602 76 1622 0\n',
'r 77 1602 91 1622 1\n', 'h 106 1602 127 1631 0\n',
'e 127 1602 144 1622 1\n', 'h 160 1602 181 1631 0\n',
'e 181 1602 198 1622 0\n', 'a 200 1602 218 1622 0\n',
'r 218 1602 232 1622 0\n', 'd 234 1602 254 1631 1\n',
't 268 1602 280 1627 0\n', 'h 280 1602 301 1631 0\n',
'e 302 1602 319 1622 1\n', 'd 335 1602 355 1631 0\n'] """)
data = d.read() # read contents of .dat file
data = data.strip() # remove trailing newline
data = data.replace('\n', '') # remove all newlines
data = data.replace("', '", "','") # clean up separators
data = data[2:-2] # remove leading and trailing delimiters
data = data.split("','") # convert into a clean list
data = '\n'.join(data) # re-combine into a string to load into numpy
print(data) # have a look at the new string format
生成的 .dat 字符串如下所示:
r 11 1602 24 1622 0
i 26 1602 36 1631 0
v 37 1602 57 1621 0
e 59 1602 76 1622 0
r 77 1602 91 1622 1
h 106 1602 127 1631 0
e 127 1602 144 1622 1
h 160 1602 181 1631 0
e 181 1602 198 1622 0
a 200 1602 218 1622 0
r 218 1602 232 1622 0
d 234 1602 254 1631 1
t 268 1602 280 1627 0
h 280 1602 301 1631 0
e 302 1602 319 1622 1
d 335 1602 355 1631 0
愚蠢的脚注:我发现很有趣的是,第一列似乎是一个离合诗:“river he said the d...”,最后一列中的 1 标记了每个单词的结尾:-) ,不关我的事。
更严重的是,如果您可以从一开始就安排您的 .dat 文件采用这种格式,那么上述所有步骤都将是不必要的。现在我们准备轻松导入到 numpy 数组中:
import numpy as np
d = StringIO(data)
# The column names 'a' to 'f' are arbitrary
# and can be changed to suit
# also the numbers are all arbitrarily imported as floats
data = np.loadtxt(d, dtype={'names': ('a', 'b', 'c', 'd', 'e', 'f'),
'formats': ('S1', 'f', 'f', 'f', 'f', 'f')})
print(data)
结果如下:
[('r', 11.0, 1602.0, 24.0, 1622.0, 0.0)
('i', 26.0, 1602.0, 36.0, 1631.0, 0.0)
('v', 37.0, 1602.0, 57.0, 1621.0, 0.0)
('e', 59.0, 1602.0, 76.0, 1622.0, 0.0)
('r', 77.0, 1602.0, 91.0, 1622.0, 1.0)
('h', 106.0, 1602.0, 127.0, 1631.0, 0.0)
('e', 127.0, 1602.0, 144.0, 1622.0, 1.0)
('h', 160.0, 1602.0, 181.0, 1631.0, 0.0)
('e', 181.0, 1602.0, 198.0, 1622.0, 0.0)
('a', 200.0, 1602.0, 218.0, 1622.0, 0.0)
('r', 218.0, 1602.0, 232.0, 1622.0, 0.0)
('d', 234.0, 1602.0, 254.0, 1631.0, 1.0)
('t', 268.0, 1602.0, 280.0, 1627.0, 0.0)
('h', 280.0, 1602.0, 301.0, 1631.0, 0.0)
('e', 302.0, 1602.0, 319.0, 1622.0, 1.0)
('d', 335.0, 1602.0, 355.0, 1631.0, 0.0)]
关于python - 将 .dat 和 .npy 加载到 Python 中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33582503/