我将数据帧下载到 csv,进行了一些更改,然后再次尝试调用 is 。由于某些原因,日期列全部混淆了。
有人可以帮忙告诉我为什么收到此消息吗? 在保存为 csv 之前,我的 df 看起来像这样:
aapl = web.DataReader("AAPL", "yahoo", start, end)
bbry = web.DataReader("BBRY", "yahoo", start, end)
lulu = web.DataReader("LULU", "yahoo", start, end)
amzn = web.DataReader("AMZN", "yahoo", start, end)
# Below I create a DataFrame consisting of the adjusted closing price of these stocks, first by making a list of these objects and using the join method
stocks = pd.DataFrame({"AAPL": aapl["Adj Close"],
"BBRY": bbry["Adj Close"],
"LULU": lulu["Adj Close"],
"AMZN":amzn["Adj Close"]}, pd.date_range(start, end, freq='BM'))
stocks.head()
Out[60]:
AAPL AMZN BBRY LULU
2011-11-30 49.987684 192.289993 17.860001 49.700001
2011-12-30 52.969683 173.100006 14.500000 46.660000
2012-01-31 59.702715 194.440002 16.629999 63.130001
2012-02-29 70.945373 179.690002 14.170000 67.019997
2012-03-30 78.414750 202.509995 14.700000 74.730003
In [74]:
stocks.to_csv('A5.csv', encoding='utf-8')
读取正确的 csv 后,它现在看起来像这样:
In [81]:
stocks1.head()
Out[81]:
Unnamed: 0 AAPL AMZN BBRY LULU
0 2011-11-30 00:00:00 49.987684 192.289993 17.860001 49.700001
1 2011-12-30 00:00:00 52.969683 173.100006 14.500000 46.660000
2 2012-01-31 00:00:00 59.702715 194.440002 16.629999 63.130001
3 2012-02-29 00:00:00 70.945373 179.690002 14.170000 67.019997
4 2012-03-30 00:00:00 78.414750 202.509995 14.700000 74.730003
为什么它不将日期列识别为日期?
感谢您的帮助
最佳答案
我建议您使用 HDF 存储而不是 CSV - 它速度更快,它可以保留您的数据类型,您可以有条件地选择数据集的子集,它支持快速压缩等。
import pandas_datareader.data as web
stocklist = ['AAPL','BBRY','LULU','AMZN']
p = web.DataReader(stocklist, 'yahoo', '2011-11-01', '2012-04-01')
df = p['Adj Close'].resample('M').last()
print(df)
# saving DF to HDF file
store = pd.HDFStore(r'd:/temp/stocks.h5')
store.append('stocks', df, data_columns=True, complib='blosc', complevel=5)
store.close()
输出:
AAPL AMZN BBRY LULU
Date
2011-11-30 49.987684 192.289993 17.860001 49.700001
2011-12-31 52.969683 173.100006 14.500000 46.660000
2012-01-31 59.702715 194.440002 16.629999 63.130001
2012-02-29 70.945373 179.690002 14.170000 67.019997
2012-03-31 78.414750 202.509995 14.700000 74.730003
让我们从 HDF 文件读回数据:
In [9]: store = pd.HDFStore(r'd:/temp/stocks.h5')
In [10]: x = store.select('stocks')
In [11]: x
Out[11]:
AAPL AMZN BBRY LULU
Date
2011-11-30 49.987684 192.289993 17.860001 49.700001
2011-12-31 52.969683 173.100006 14.500000 46.660000
2012-01-31 59.702715 194.440002 16.629999 63.130001
2012-02-29 70.945373 179.690002 14.170000 67.019997
2012-03-31 78.414750 202.509995 14.700000 74.730003
您可以有条件地选择数据:
In [12]: x = store.select('stocks', where="AAPL >= 50 and AAPL <= 70")
In [13]: x
Out[13]:
AAPL AMZN BBRY LULU
Date
2011-12-31 52.969683 173.100006 14.500000 46.660000
2012-01-31 59.702715 194.440002 16.629999 63.130001
检查索引数据类型:
In [14]: x.index.dtype
Out[14]: dtype('<M8[ns]')
In [15]: x.index.dtype_str
Out[15]: 'datetime64[ns]'
关于python - 数据框打印不正确,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40406130/