python - 数据框打印不正确

我将数据帧下载到 csv，进行了一些更改，然后再次尝试调用 is 。由于某些原因，日期列全部混淆了。

有人可以帮忙告诉我为什么收到此消息吗？在保存为 csv 之前，我的 df 看起来像这样:

aapl = web.DataReader("AAPL", "yahoo", start, end)
bbry = web.DataReader("BBRY", "yahoo", start, end)
lulu = web.DataReader("LULU", "yahoo", start, end)
amzn = web.DataReader("AMZN", "yahoo", start, end)

# Below I create a DataFrame consisting of the adjusted closing price of these stocks, first by making a list of these objects and using the join method
stocks = pd.DataFrame({"AAPL": aapl["Adj Close"],
                      "BBRY": bbry["Adj Close"],
                      "LULU": lulu["Adj Close"],
                      "AMZN":amzn["Adj Close"]}, pd.date_range(start, end, freq='BM'))

stocks.head()


Out[60]:
AAPL    AMZN    BBRY    LULU
2011-11-30  49.987684   192.289993  17.860001   49.700001
2011-12-30  52.969683   173.100006  14.500000   46.660000
2012-01-31  59.702715   194.440002  16.629999   63.130001
2012-02-29  70.945373   179.690002  14.170000   67.019997
2012-03-30  78.414750   202.509995  14.700000   74.730003
In [74]:

stocks.to_csv('A5.csv', encoding='utf-8')

读取正确的 csv 后，它现在看起来像这样:

In [81]:

stocks1.head()
Out[81]:
Unnamed: 0  AAPL    AMZN    BBRY    LULU
0   2011-11-30 00:00:00 49.987684   192.289993  17.860001   49.700001
1   2011-12-30 00:00:00 52.969683   173.100006  14.500000   46.660000
2   2012-01-31 00:00:00 59.702715   194.440002  16.629999   63.130001
3   2012-02-29 00:00:00 70.945373   179.690002  14.170000   67.019997
4   2012-03-30 00:00:00 78.414750   202.509995  14.700000   74.730003

为什么它不将日期列识别为日期？

感谢您的帮助

最佳答案

我建议您使用 HDF 存储而不是 CSV - 它速度更快，它可以保留您的数据类型，您可以有条件地选择数据集的子集，它支持快速压缩等。

import pandas_datareader.data as web

stocklist = ['AAPL','BBRY','LULU','AMZN']
p = web.DataReader(stocklist, 'yahoo', '2011-11-01', '2012-04-01')
df = p['Adj Close'].resample('M').last()
print(df)

# saving DF to HDF file
store = pd.HDFStore(r'd:/temp/stocks.h5')
store.append('stocks', df, data_columns=True, complib='blosc', complevel=5)
store.close()

输出:

                 AAPL        AMZN       BBRY       LULU
Date
2011-11-30  49.987684  192.289993  17.860001  49.700001
2011-12-31  52.969683  173.100006  14.500000  46.660000
2012-01-31  59.702715  194.440002  16.629999  63.130001
2012-02-29  70.945373  179.690002  14.170000  67.019997
2012-03-31  78.414750  202.509995  14.700000  74.730003

让我们从 HDF 文件读回数据:

In [9]: store = pd.HDFStore(r'd:/temp/stocks.h5')

In [10]: x = store.select('stocks')

In [11]: x
Out[11]:
                 AAPL        AMZN       BBRY       LULU
Date
2011-11-30  49.987684  192.289993  17.860001  49.700001
2011-12-31  52.969683  173.100006  14.500000  46.660000
2012-01-31  59.702715  194.440002  16.629999  63.130001
2012-02-29  70.945373  179.690002  14.170000  67.019997
2012-03-31  78.414750  202.509995  14.700000  74.730003

您可以有条件地选择数据:

In [12]: x = store.select('stocks', where="AAPL >= 50 and AAPL <= 70")

In [13]: x
Out[13]:
                 AAPL        AMZN       BBRY       LULU
Date
2011-12-31  52.969683  173.100006  14.500000  46.660000
2012-01-31  59.702715  194.440002  16.629999  63.130001

检查索引数据类型:

In [14]: x.index.dtype
Out[14]: dtype('<M8[ns]')

In [15]: x.index.dtype_str
Out[15]: 'datetime64[ns]'

关于python - 数据框打印不正确，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40406130/

python - 数据框打印不正确

上一篇：Python - 使用Flipkart API的参数

下一篇：python - 现场交叉口操作