我尝试不同的事情已经太久了。
如何将包含日期的 csv 数据加载到 numpy 数组中? 这是行不通的。它创建了一条线,所有应该是线的东西现在都在一个单元格中。
import io
import numpy as np
import datetime as dt
def date_parser(d_bytes):
s = d_bytes.decode('utf-8')
return np.datetime64(dt.datetime.strptime(s, "%Y-%m-%d %H:%M:%S"))
def read_csv():
five_min_candles_str = """2020-06-01 17:05:00,9506.01,9523.31,9500.0,9514.52
2020-06-01 17:10:00,9513.44,9525.22,9500.32,9522.0
2020-06-01 17:15:00,9521.56,9525.59,9513.75,9523.53
2020-06-01 17:20:00,9523.21,9525.53,9518.78,9524.55
2020-06-01 17:25:00,9524.55,9538.4,9522.93,9528.73
2020-06-01 17:30:00,9528.73,9548.98,9527.95,9543.72
2020-06-01 17:35:00,9542.71,9547.34,9536.57,9543.66
2020-06-01 17:40:00,9543.67,9543.67,9530.0,9531.85
2020-06-01 17:45:00,9530.84,9535.01,9524.1,9526.75
2020-06-01 17:50:00,9526.47,9538.64,9521.87,9534.57
2020-06-01 17:55:00,9534.58,9548.9,9533.04,9546.98
2020-06-01 18:00:00,9548.18,9558.9,9536.99,9556.25
2020-06-01 18:05:00,9556.15,9579.8,9547.7,9574.09
2020-06-01 18:10:00,9575.0,9592.59,9571.3,9573.93
2020-06-01 18:15:00,9573.68,9610.0,9569.6,9597.78
2020-06-01 18:20:00,9597.78,9598.85,9578.0,9591.39
"""
nparray = np.genfromtxt(io.StringIO(five_min_candles_str),
delimiter=',',
dtype=[('Timestamp','datetime64[us]'),
('Open','object'),
('High','object'),
('Low','object'),
('Close','object')],
converters={0: date_parser},
)
print(nparray)
if __name__ == "__main__":
read_csv()
如果有解决方案或提示,我们将不胜感激!
编辑: 事实证明它确实已经在工作了,但我期望一个 2D 数组,而在我添加类型或转换器后它变成了一个元组数组。其原因是连续的不同类型。请参阅the other SO question
无论如何,我将下面的答案标记为正确的,因为我更喜欢它,因为它不需要任何自定义的日期解析,而且与 相比,我也更喜欢
splitlines()
解决方案io.StringIO()
最佳答案
In [53]: five_min_candles_str = """2020-06-01 17:05:00,9506.01,9523.31,9500.0,95
...: 14.52
...: 2020-06-01 17:10:00,9513.44,9525.22,9500.32,9522.0
...: 2020-06-01 17:15:00,9521.56,9525.59,9513.75,9523.53
...: 2020-06-01 17:20:00,9523.21,9525.53,9518.78,9524.55
...: 2020-06-01 17:25:00,9524.55,9538.4,9522.93,9528.73
...: 2020-06-01 17:30:00,9528.73,9548.98,9527.95,9543.72
...: 2020-06-01 17:35:00,9542.71,9547.34,9536.57,9543.66
...: 2020-06-01 17:40:00,9543.67,9543.67,9530.0,9531.85
...: 2020-06-01 17:45:00,9530.84,9535.01,9524.1,9526.75
...: 2020-06-01 17:50:00,9526.47,9538.64,9521.87,9534.57
...: 2020-06-01 17:55:00,9534.58,9548.9,9533.04,9546.98
...: 2020-06-01 18:00:00,9548.18,9558.9,9536.99,9556.25
...: 2020-06-01 18:05:00,9556.15,9579.8,9547.7,9574.09
...: 2020-06-01 18:10:00,9575.0,9592.59,9571.3,9573.93
...: 2020-06-01 18:15:00,9573.68,9610.0,9569.6,9597.78
...: 2020-06-01 18:20:00,9597.78,9598.85,9578.0,9591.39
...: """
让我们看看 numpy 如何处理这些日期字符串。它不像pandas
那么强大,但是:
In [55]: np.array('2020-06-01 17:05:00', 'datetime64[s]')
Out[55]: array('2020-06-01T17:05:00', dtype='datetime64[s]')
不过看起来还不错。日期和时间之间的空格即可(“T”也有效)。
所以让我们尝试一下全自动数据类型:
In [56]: data=np.genfromtxt(five_min_candles_str.splitlines(), delimiter=',', dt
...: ype=None, encoding=True)
In [57]: data
Out[57]:
array([('2020-06-01 17:05:00', 9506.01, 9523.31, 9500. , 9514.52),
('2020-06-01 17:10:00', 9513.44, 9525.22, 9500.32, 9522. ),
('2020-06-01 17:15:00', 9521.56, 9525.59, 9513.75, 9523.53),
...
('2020-06-01 18:20:00', 9597.78, 9598.85, 9578. , 9591.39)],
dtype=[('f0', '<U19'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')])
所以我们需要指定日期时间数据类型(编辑该数据类型):
In [58]: dt = [('f0', 'datetime64[s]'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')]
In [59]: data=np.genfromtxt(five_min_candles_str.splitlines(), delimiter=',', dtype=dt, encoding=True)
In [60]: data
Out[60]:
array([('2020-06-01T17:05:00', 9506.01, 9523.31, 9500. , 9514.52),
('2020-06-01T17:10:00', 9513.44, 9525.22, 9500.32, 9522. ),
('2020-06-01T17:15:00', 9521.56, 9525.59, 9513.75, 9523.53),
('2020-06-01T17:20:00', 9523.21, 9525.53, 9518.78, 9524.55),
...
('2020-06-01T18:20:00', 9597.78, 9598.85, 9578. , 9591.39)],
dtype=[('f0', '<M8[s]'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')])
关于python - 如何将包含日期的 csv 数据加载到 numpy 数组中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62161006/