python - 将 pandas DataFrame 读取到 Stocker 时出现问题

标签 python pandas quandl

我最近开始开发一个使用 Stocker 的项目(一个运行 fbprophet 的 API,用于利用股票数据进行机器学习)。我喜欢 API 的简单性,但它有一个致命的缺陷。它使用 quandl 来接收其库存数据。 Quandl 在 2018 年的某个时候停止更新他们的数据,当您使用旧数据时,不可能运行准确的数据模型。我研究了 Stocker 代码,据我所知,它只使用 quandl 一行,即

stock = quandl.get('%s/%s' % (exchange, ticker))

quandl 中的这一行以 pandas 数据帧的形式返回股票数据。我想既然这就是 quandl 的用途,我可以编写自己类型的 quandl 来从不同的源(IEX)获取数据并将其作为 DataFrame 返回。我编写了代码(附在下面),但在stocker中创建模型时不断收到此错误:

  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Date'

我对这个很迷茫,而且对 Pandas 不太熟悉。非常感谢任何帮助!

Stocker 的相关部分,展示了如何使用 quandly 获取库存数据

# Quandl for financial analysis, pandas and numpy for data manipulation
# fbprophet for additive models, #pytrends for Google trend data
#import quandl
import stockdata
import pandas as pd
import numpy as np
import fbprophet
import pytrends
from pytrends.request import TrendReq

# matplotlib pyplot for plotting
import matplotlib.pyplot as plt

import matplotlib

# Class for analyzing and (attempting) to predict future prices
# Contains a number of visualizations and analysis methods
class Stocker():

    # Initialization requires a ticker symbol
    def __init__(self, ticker, exchange='IEX'):

        # Enforce capitalization
        ticker = ticker.upper()

        # Symbol is used for labeling plots
        self.symbol = ticker

        # Use Personal Api Key
        # quandl.ApiConfig.api_key = 'YourKeyHere'

        # Retrieval the financial data
        try:
            stock = stockdata.get(ticker)
            print(stock)

        except Exception as e:
            print('Error Retrieving Data.')
            print(e)
            return

        # Set the index to a column called Date
        stock = stock.reset_index(level=0)

        # Columns required for prophet
        stock['ds'] = stock['Date']

        if ('Adj. Close' not in stock.columns):
            stock['Adj. Close'] = stock['Close']
            stock['Adj. Open'] = stock['Open']

        stock['y'] = stock['Adj. Close']
        stock['Daily Change'] = stock['Adj. Close'] - stock['Adj. Open']

        # Data assigned as class attribute
        self.stock = stock.copy()

        # Minimum and maximum date in range
        self.min_date = min(stock['Date'])
        self.max_date = max(stock['Date'])

        # Find max and min prices and dates on which they occurred
        self.max_price = np.max(self.stock['y'])
        self.min_price = np.min(self.stock['y'])

        self.min_price_date = self.stock[self.stock['y'] == self.min_price]['Date']
        self.min_price_date = self.min_price_date[self.min_price_date.index[0]]
        self.max_price_date = self.stock[self.stock['y'] == self.max_price]['Date']
        self.max_price_date = self.max_price_date[self.max_price_date.index[0]]

        # The starting price (starting with the opening price)
        self.starting_price = float(self.stock.ix[0, 'Adj. Open'])

        # The most recent price
        self.most_recent_price = float(self.stock.ix[len(self.stock) - 1, 'y'])

        # Whether or not to round dates
        self.round_dates = True

        # Number of years of data to train on
        self.training_years = 3

        # Prophet parameters
        # Default prior from library
        self.changepoint_prior_scale = 0.05 
        self.weekly_seasonality = False
        self.daily_seasonality = False
        self.monthly_seasonality = True
        self.yearly_seasonality = True
        self.changepoints = None

        print('{} Stocker Initialized. Data covers {} to {}.'.format(self.symbol,
                                                                     self.min_date.date(),
                                                                     self.max_date.date()))

Quandl 的 get 函数

def get(dataset, **kwargs):
    """Return dataframe of requested dataset from Quandl.
    :param dataset: str or list, depending on single dataset usage or multiset usage
            Dataset codes are available on the Quandl website
    :param str api_key: Downloads are limited to 50 unless api_key is specified
    :param str start_date, end_date: Optional datefilers, otherwise entire
           dataset is returned
    :param str collapse: Options are daily, weekly, monthly, quarterly, annual
    :param str transform: options are diff, rdiff, cumul, and normalize
    :param int rows: Number of rows which will be returned
    :param str order: options are asc, desc. Default: `asc`
    :param str returns: specify what format you wish your dataset returned as,
        either `numpy` for a numpy ndarray or `pandas`. Default: `pandas`
    :returns: :class:`pandas.DataFrame` or :class:`numpy.ndarray`
    Note that Pandas expects timeseries data to be sorted ascending for most
    timeseries functionality to work.
    Any other `kwargs` passed to `get` are sent as field/value params to Quandl
    with no interference.
    """

    _convert_params_to_v3(kwargs)

    data_format = kwargs.pop('returns', 'pandas')

    ApiKeyUtil.init_api_key_from_args(kwargs)

    # Check whether dataset is given as a string
    # (for a single dataset) or an array (for a multiset call)

    # Unicode String
    if isinstance(dataset, string_types):
        dataset_args = _parse_dataset_code(dataset)
        if dataset_args['column_index'] is not None:
            kwargs.update({'column_index': dataset_args['column_index']})
        data = Dataset(dataset_args['code']).data(params=kwargs, handle_column_not_found=True)
    # Array
    elif isinstance(dataset, list):
        args = _build_merged_dataset_args(dataset)
        # handle_not_found_error if set to True will add an empty DataFrame
        # for a non-existent dataset instead of raising an error
        data = MergedDataset(args).data(params=kwargs,
                                        handle_not_found_error=True,
                                        handle_column_not_found=True)
    # If wrong format
    else:
        raise InvalidRequestError(Message.ERROR_DATASET_FORMAT)

    if data_format == 'numpy':
        return data.to_numpy()
    return data.to_pandas()


def _parse_dataset_code(dataset):
    if '.' not in dataset:
        return {'code': dataset, 'column_index': None}
    dataset_temp = dataset.split('.')
    if not dataset_temp[1].isdigit():
        raise ValueError(Message.ERROR_COLUMN_INDEX_TYPE % dataset)
    return {'code': dataset_temp[0], 'column_index': int(dataset_temp[1])}

我的贫民区获取功能

import pandas_datareader.data as web
from datetime import date, timedelta

start = date.today()-timedelta(days=1080)
end = date.today()

def get(ticker):
    df = web.DataReader(name=ticker.upper(), data_source='iex', start=start, end=end)
    return df

最佳答案

该问题取决于 Quandl 和 IEX 返回的列和列名称。

Quandl 返回:

Date Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open   Adj. High   Adj. Low    Adj. Close  Adj. Volume

当 IEX 返回时:

date open high low close volume

IEX 返回调整后的价格,以便您可以将 IEX“收盘”列映射到 Quandl“Adj”。关闭'

因此,如果您想使用 Stocker 格式(Quandl 格式),您可以像这样创建所需的列:

# >-Quandl format-<    >-- IEX --<
stock['Adj. Close']  = stock['close']
stock['Date']        = stock ['date']

etc...

请注意,您可能需要将字符串日期从 IEX 转换为日期时间格式

关于python - 将 pandas DataFrame 读取到 Stocker 时出现问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55802131/

相关文章:

python - Flask 扩展未在 app.extensions 中注册

python - NLTK 句子边界错误

python - 有没有办法在 Pandas 中显示 24 小时以上的时间数据?

python - 如果两列相等,则保留重复项

python - 表征股票市场神经网络的 Keras 损失和准确性

python - 使用 python 内置函数进行耦合 ODE

python - Pandas 从 groupby 中快速加权随机选择

r - 在 Yahoo! 中抓取关键统计数据用 R 理财

Python Quandl 给我错误

python Pandas : iloc with flexible column position