我最近开始开发一个使用 Stocker 的项目(一个运行 fbprophet 的 API,用于利用股票数据进行机器学习)。我喜欢 API 的简单性,但它有一个致命的缺陷。它使用 quandl 来接收其库存数据。 Quandl 在 2018 年的某个时候停止更新他们的数据,当您使用旧数据时,不可能运行准确的数据模型。我研究了 Stocker 代码,据我所知,它只使用 quandl 一行,即
stock = quandl.get('%s/%s' % (exchange, ticker))
quandl 中的这一行以 pandas 数据帧的形式返回股票数据。我想既然这就是 quandl 的用途,我可以编写自己类型的 quandl 来从不同的源(IEX)获取数据并将其作为 DataFrame 返回。我编写了代码(附在下面),但在stocker中创建模型时不断收到此错误:
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Date'
我对这个很迷茫,而且对 Pandas 不太熟悉。非常感谢任何帮助!
Stocker 的相关部分,展示了如何使用 quandly 获取库存数据
# Quandl for financial analysis, pandas and numpy for data manipulation
# fbprophet for additive models, #pytrends for Google trend data
#import quandl
import stockdata
import pandas as pd
import numpy as np
import fbprophet
import pytrends
from pytrends.request import TrendReq
# matplotlib pyplot for plotting
import matplotlib.pyplot as plt
import matplotlib
# Class for analyzing and (attempting) to predict future prices
# Contains a number of visualizations and analysis methods
class Stocker():
# Initialization requires a ticker symbol
def __init__(self, ticker, exchange='IEX'):
# Enforce capitalization
ticker = ticker.upper()
# Symbol is used for labeling plots
self.symbol = ticker
# Use Personal Api Key
# quandl.ApiConfig.api_key = 'YourKeyHere'
# Retrieval the financial data
try:
stock = stockdata.get(ticker)
print(stock)
except Exception as e:
print('Error Retrieving Data.')
print(e)
return
# Set the index to a column called Date
stock = stock.reset_index(level=0)
# Columns required for prophet
stock['ds'] = stock['Date']
if ('Adj. Close' not in stock.columns):
stock['Adj. Close'] = stock['Close']
stock['Adj. Open'] = stock['Open']
stock['y'] = stock['Adj. Close']
stock['Daily Change'] = stock['Adj. Close'] - stock['Adj. Open']
# Data assigned as class attribute
self.stock = stock.copy()
# Minimum and maximum date in range
self.min_date = min(stock['Date'])
self.max_date = max(stock['Date'])
# Find max and min prices and dates on which they occurred
self.max_price = np.max(self.stock['y'])
self.min_price = np.min(self.stock['y'])
self.min_price_date = self.stock[self.stock['y'] == self.min_price]['Date']
self.min_price_date = self.min_price_date[self.min_price_date.index[0]]
self.max_price_date = self.stock[self.stock['y'] == self.max_price]['Date']
self.max_price_date = self.max_price_date[self.max_price_date.index[0]]
# The starting price (starting with the opening price)
self.starting_price = float(self.stock.ix[0, 'Adj. Open'])
# The most recent price
self.most_recent_price = float(self.stock.ix[len(self.stock) - 1, 'y'])
# Whether or not to round dates
self.round_dates = True
# Number of years of data to train on
self.training_years = 3
# Prophet parameters
# Default prior from library
self.changepoint_prior_scale = 0.05
self.weekly_seasonality = False
self.daily_seasonality = False
self.monthly_seasonality = True
self.yearly_seasonality = True
self.changepoints = None
print('{} Stocker Initialized. Data covers {} to {}.'.format(self.symbol,
self.min_date.date(),
self.max_date.date()))
Quandl 的 get 函数
def get(dataset, **kwargs):
"""Return dataframe of requested dataset from Quandl.
:param dataset: str or list, depending on single dataset usage or multiset usage
Dataset codes are available on the Quandl website
:param str api_key: Downloads are limited to 50 unless api_key is specified
:param str start_date, end_date: Optional datefilers, otherwise entire
dataset is returned
:param str collapse: Options are daily, weekly, monthly, quarterly, annual
:param str transform: options are diff, rdiff, cumul, and normalize
:param int rows: Number of rows which will be returned
:param str order: options are asc, desc. Default: `asc`
:param str returns: specify what format you wish your dataset returned as,
either `numpy` for a numpy ndarray or `pandas`. Default: `pandas`
:returns: :class:`pandas.DataFrame` or :class:`numpy.ndarray`
Note that Pandas expects timeseries data to be sorted ascending for most
timeseries functionality to work.
Any other `kwargs` passed to `get` are sent as field/value params to Quandl
with no interference.
"""
_convert_params_to_v3(kwargs)
data_format = kwargs.pop('returns', 'pandas')
ApiKeyUtil.init_api_key_from_args(kwargs)
# Check whether dataset is given as a string
# (for a single dataset) or an array (for a multiset call)
# Unicode String
if isinstance(dataset, string_types):
dataset_args = _parse_dataset_code(dataset)
if dataset_args['column_index'] is not None:
kwargs.update({'column_index': dataset_args['column_index']})
data = Dataset(dataset_args['code']).data(params=kwargs, handle_column_not_found=True)
# Array
elif isinstance(dataset, list):
args = _build_merged_dataset_args(dataset)
# handle_not_found_error if set to True will add an empty DataFrame
# for a non-existent dataset instead of raising an error
data = MergedDataset(args).data(params=kwargs,
handle_not_found_error=True,
handle_column_not_found=True)
# If wrong format
else:
raise InvalidRequestError(Message.ERROR_DATASET_FORMAT)
if data_format == 'numpy':
return data.to_numpy()
return data.to_pandas()
def _parse_dataset_code(dataset):
if '.' not in dataset:
return {'code': dataset, 'column_index': None}
dataset_temp = dataset.split('.')
if not dataset_temp[1].isdigit():
raise ValueError(Message.ERROR_COLUMN_INDEX_TYPE % dataset)
return {'code': dataset_temp[0], 'column_index': int(dataset_temp[1])}
我的贫民区获取功能
import pandas_datareader.data as web
from datetime import date, timedelta
start = date.today()-timedelta(days=1080)
end = date.today()
def get(ticker):
df = web.DataReader(name=ticker.upper(), data_source='iex', start=start, end=end)
return df
最佳答案
该问题取决于 Quandl 和 IEX 返回的列和列名称。
Quandl 返回:
Date Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume
当 IEX 返回时:
date open high low close volume
IEX 返回调整后的价格,以便您可以将 IEX“收盘”列映射到 Quandl“Adj”。关闭'
因此,如果您想使用 Stocker 格式(Quandl 格式),您可以像这样创建所需的列:
# >-Quandl format-< >-- IEX --<
stock['Adj. Close'] = stock['close']
stock['Date'] = stock ['date']
etc...
请注意,您可能需要将字符串日期从 IEX 转换为日期时间格式
关于python - 将 pandas DataFrame 读取到 Stocker 时出现问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55802131/