我正在使用pandas_market_calendars
获取纽约证券交易所的市场日历。我将市场日历设置为变量 nyse_calendar
并注意它返回的数据类型是 <class 'pandas_market_calendars.exchange_calendar_nyse.NYSEExchangeCalendar'>
。这没有帮助,因为我的主数据文件存储在数据类型为 <class 'numpy.str_'>
的 numpy 数组中。 。所以我将 nyse_calendar
转换为使用 .to_numpy()
到 numpy 数组。当我打印 now 的数据类型时,它返回 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
,同样没有用,因为我想将主数据文件中的日期和时间与此日历进行比较。
因此,当我从 trading_days
打印值时数组,它返回 2011-09-20 13:30:00+00:00
,将其转换为字符串后。
所以,我想做的是循环 trading_days
(numpy 数组),并使用 .split()
的组合将值转换为字符串值。下面是获得更好上下文的代码:
import numpy as np
import pandas_market_calendars as mkt_cal
from datetime import datetime
import pandas as pd
#set up the NYSE trading calendar
#create new market calendar
nyse_calendar = mkt_cal.get_calendar('NYSE')
#create a dataframe with only trading days - includes early closes
#needs to be from beginning of testing to end of testing data
nyse_schedule = nyse_calendar.schedule(start_date='2011-09-18', end_date='2019-12-05')
#convert dataframe to a numpy array
#reference: trading_days[0,0], trading_days[1,0] etc.
#open date & time in col 0, close date & time in col 1
trading_days = nyse_schedule.to_numpy()
print(trading_days)
>>>[[Timestamp('2011-09-19 13:30:00+0000', tz='UTC')
Timestamp('2011-09-19 20:00:00+0000', tz='UTC')]
[Timestamp('2011-09-20 13:30:00+0000', tz='UTC')
Timestamp('2011-09-20 20:00:00+0000', tz='UTC')]
[Timestamp('2011-09-21 13:30:00+0000', tz='UTC')
Timestamp('2011-09-21 20:00:00+0000', tz='UTC')]
...
[Timestamp('2019-12-03 14:30:00+0000', tz='UTC')
Timestamp('2019-12-03 21:00:00+0000', tz='UTC')]
[Timestamp('2019-12-04 14:30:00+0000', tz='UTC')
Timestamp('2019-12-04 21:00:00+0000', tz='UTC')]
[Timestamp('2019-12-05 14:30:00+0000', tz='UTC')
Timestamp('2019-12-05 21:00:00+0000', tz='UTC')]]
print("trading data type: ",type(trading_days[1,0]))
print("trading data: ", trading_days[1,0])
>>>trading data type: <class 'pandas._libs.tslibs.timestamps.Timestamp'>
trading data: 2011-09-20 13:30:00+00:00
#now going to loop through the nyse calendar, convert to string and return in new numpy array
#date, open time, close time
exchng_cal = np.empty((trading_days.shape[0],3),dtype=str)
for i in range(trading_days.shape[0]-1):
temp_str_open = str(trading_days[i,0])
print(temp_str_open)
temp_str_close = str(trading_days[i,1])
print(temp_str_close)
#date
exchng_cal[i,0] = temp_str_open.split()[0]
print(temp_str_open.split()[0])
#open time
exchng_cal[i,1] = temp_str_open.split()[1].split('+')[0]
print(temp_str_open.split()[1].split('+')[0])
#close time
exchng_cal[i,2] = temp_str_close.split()[1].split('+')[0]
print(temp_str_close.split()[1].split('+')[0])
print(exchng_cal)
>>>2019-12-04 14:30:00+00:00
2019-12-04 21:00:00+00:00
2019-12-04
14:30:00
21:00:00
[['2' '1' '2']
['2' '1' '2']
['2' '1' '2']
...
['2' '1' '2']
['2' '1' '2']
['' '' '']]
我已经缩短了最后的打印输出,但是正如您所看到的,当我打印各个元素时,它们使用正确的值打印,但是当我打印 exchng_cal
时它返回['2','1','2']
.
最佳答案
在 numpy 中,您必须指定字符串长度(请参阅np.chararray
)。默认值为 1,因此您的值会被截断。由于您的数据结构需要不同长度的字符串,因此这可能是一个解决方案:
exchng_cal = np.empty((trading_days.shape[0],3),dtype='object')
关于python - 将数据从一个 Numpy 数组移动到另一个返回错误数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59323308/