python - 将数据从一个 Numpy 数组移动到另一个返回错误数据

我正在使用pandas_market_calendars获取纽约证券交易所的市场日历。我将市场日历设置为变量 nyse_calendar并注意它返回的数据类型是 <class 'pandas_market_calendars.exchange_calendar_nyse.NYSEExchangeCalendar'> 。这没有帮助，因为我的主数据文件存储在数据类型为 <class 'numpy.str_'> 的 numpy 数组中。。所以我将 nyse_calendar 转换为使用 .to_numpy() 到 numpy 数组。当我打印 now 的数据类型时，它返回 <class 'pandas._libs.tslibs.timestamps.Timestamp'> ，同样没有用，因为我想将主数据文件中的日期和时间与此日历进行比较。

因此，当我从 trading_days 打印值时数组，它返回 2011-09-20 13:30:00+00:00 ，将其转换为字符串后。

所以，我想做的是循环 trading_days (numpy 数组)，并使用 .split() 的组合将值转换为字符串值。下面是获得更好上下文的代码:

import numpy as np
import pandas_market_calendars as mkt_cal
from datetime import datetime
import pandas as pd

#set up the NYSE trading calendar
#create new market calendar
nyse_calendar = mkt_cal.get_calendar('NYSE')

#create a dataframe with only trading days - includes early closes
#needs to be from beginning of testing to end of testing data
nyse_schedule = nyse_calendar.schedule(start_date='2011-09-18', end_date='2019-12-05')

#convert dataframe to a numpy array
#reference: trading_days[0,0], trading_days[1,0] etc.
#open date & time in col 0, close date & time in col 1
trading_days = nyse_schedule.to_numpy()

print(trading_days)
>>>[[Timestamp('2011-09-19 13:30:00+0000', tz='UTC')
  Timestamp('2011-09-19 20:00:00+0000', tz='UTC')]
 [Timestamp('2011-09-20 13:30:00+0000', tz='UTC')
  Timestamp('2011-09-20 20:00:00+0000', tz='UTC')]
 [Timestamp('2011-09-21 13:30:00+0000', tz='UTC')
  Timestamp('2011-09-21 20:00:00+0000', tz='UTC')]
 ...
 [Timestamp('2019-12-03 14:30:00+0000', tz='UTC')
  Timestamp('2019-12-03 21:00:00+0000', tz='UTC')]
 [Timestamp('2019-12-04 14:30:00+0000', tz='UTC')
  Timestamp('2019-12-04 21:00:00+0000', tz='UTC')]
 [Timestamp('2019-12-05 14:30:00+0000', tz='UTC')
  Timestamp('2019-12-05 21:00:00+0000', tz='UTC')]]

print("trading data type: ",type(trading_days[1,0]))
print("trading data: ", trading_days[1,0])
>>>trading data type:  <class 'pandas._libs.tslibs.timestamps.Timestamp'>
trading data:  2011-09-20 13:30:00+00:00

#now going to loop through the nyse calendar, convert to string and return in new numpy array
#date, open time, close time
exchng_cal = np.empty((trading_days.shape[0],3),dtype=str)

for i in range(trading_days.shape[0]-1):
    temp_str_open = str(trading_days[i,0])
    print(temp_str_open)
    temp_str_close = str(trading_days[i,1])
    print(temp_str_close)
    #date
    exchng_cal[i,0] = temp_str_open.split()[0]
    print(temp_str_open.split()[0])
    #open time
    exchng_cal[i,1] = temp_str_open.split()[1].split('+')[0]
    print(temp_str_open.split()[1].split('+')[0])
    #close time
    exchng_cal[i,2] = temp_str_close.split()[1].split('+')[0]
    print(temp_str_close.split()[1].split('+')[0])

print(exchng_cal)
>>>2019-12-04 14:30:00+00:00
2019-12-04 21:00:00+00:00
2019-12-04
14:30:00
21:00:00
[['2' '1' '2']
 ['2' '1' '2']
 ['2' '1' '2']
 ...
 ['2' '1' '2']
 ['2' '1' '2']
 ['' '' '']]

我已经缩短了最后的打印输出，但是正如您所看到的，当我打印各个元素时，它们使用正确的值打印，但是当我打印 exchng_cal 时它返回['2','1','2'] .

最佳答案

在 numpy 中，您必须指定字符串长度(请参阅np.chararray)。默认值为 1，因此您的值会被截断。由于您的数据结构需要不同长度的字符串，因此这可能是一个解决方案:

exchng_cal = np.empty((trading_days.shape[0],3),dtype='object')

关于python - 将数据从一个 Numpy 数组移动到另一个返回错误数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59323308/

python - 将数据从一个 Numpy 数组移动到另一个返回错误数据

上一篇：python - 在 Linux 上使用 headless Chrome 访问拒绝页面，而有头 Chrome 通过 Python 使用 Selenium 在 Windows 上运行

下一篇：python - 错误: the label [0] is not in the [index]