python - 数据框打印不正确

标签 python python-2.7 pandas dataframe

我将数据帧下载到 csv,进行了一些更改,然后再次尝试调用 is 。由于某些原因,日期列全部混淆了。

有人可以帮忙告诉我为什么收到此消息吗? 在保存为 csv 之前,我的 df 看起来像这样:

aapl = web.DataReader("AAPL", "yahoo", start, end)
bbry = web.DataReader("BBRY", "yahoo", start, end)
lulu = web.DataReader("LULU", "yahoo", start, end)
amzn = web.DataReader("AMZN", "yahoo", start, end)

# Below I create a DataFrame consisting of the adjusted closing price of these stocks, first by making a list of these objects and using the join method
stocks = pd.DataFrame({"AAPL": aapl["Adj Close"],
                      "BBRY": bbry["Adj Close"],
                      "LULU": lulu["Adj Close"],
                      "AMZN":amzn["Adj Close"]}, pd.date_range(start, end, freq='BM'))
​
stocks.head()

​
Out[60]:
AAPL    AMZN    BBRY    LULU
2011-11-30  49.987684   192.289993  17.860001   49.700001
2011-12-30  52.969683   173.100006  14.500000   46.660000
2012-01-31  59.702715   194.440002  16.629999   63.130001
2012-02-29  70.945373   179.690002  14.170000   67.019997
2012-03-30  78.414750   202.509995  14.700000   74.730003
In [74]:

stocks.to_csv('A5.csv', encoding='utf-8')

读取正确的 csv 后,它现在看起来像这样:

In [81]:

stocks1.head()
Out[81]:
Unnamed: 0  AAPL    AMZN    BBRY    LULU
0   2011-11-30 00:00:00 49.987684   192.289993  17.860001   49.700001
1   2011-12-30 00:00:00 52.969683   173.100006  14.500000   46.660000
2   2012-01-31 00:00:00 59.702715   194.440002  16.629999   63.130001
3   2012-02-29 00:00:00 70.945373   179.690002  14.170000   67.019997
4   2012-03-30 00:00:00 78.414750   202.509995  14.700000   74.730003

为什么它不将日期列识别为日期?

感谢您的帮助

最佳答案

我建议您使用 HDF 存储而不是 CSV - 它速度更快,它可以保留您的数据类型,您可以有条件地选择数据集的子集,它支持快速压缩等。

import pandas_datareader.data as web

stocklist = ['AAPL','BBRY','LULU','AMZN']
p = web.DataReader(stocklist, 'yahoo', '2011-11-01', '2012-04-01')
df = p['Adj Close'].resample('M').last()
print(df)

# saving DF to HDF file
store = pd.HDFStore(r'd:/temp/stocks.h5')
store.append('stocks', df, data_columns=True, complib='blosc', complevel=5)
store.close()

输出:

                 AAPL        AMZN       BBRY       LULU
Date
2011-11-30  49.987684  192.289993  17.860001  49.700001
2011-12-31  52.969683  173.100006  14.500000  46.660000
2012-01-31  59.702715  194.440002  16.629999  63.130001
2012-02-29  70.945373  179.690002  14.170000  67.019997
2012-03-31  78.414750  202.509995  14.700000  74.730003

让我们从 HDF 文件读回数据:

In [9]: store = pd.HDFStore(r'd:/temp/stocks.h5')

In [10]: x = store.select('stocks')

In [11]: x
Out[11]:
                 AAPL        AMZN       BBRY       LULU
Date
2011-11-30  49.987684  192.289993  17.860001  49.700001
2011-12-31  52.969683  173.100006  14.500000  46.660000
2012-01-31  59.702715  194.440002  16.629999  63.130001
2012-02-29  70.945373  179.690002  14.170000  67.019997
2012-03-31  78.414750  202.509995  14.700000  74.730003

您可以有条件地选择数据:

In [12]: x = store.select('stocks', where="AAPL >= 50 and AAPL <= 70")

In [13]: x
Out[13]:
                 AAPL        AMZN       BBRY       LULU
Date
2011-12-31  52.969683  173.100006  14.500000  46.660000
2012-01-31  59.702715  194.440002  16.629999  63.130001

检查索引数据类型:

In [14]: x.index.dtype
Out[14]: dtype('<M8[ns]')

In [15]: x.index.dtype_str
Out[15]: 'datetime64[ns]'

关于python - 数据框打印不正确,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40406130/

相关文章:

python - 将数据从应用程序引擎(python)发送到远程服务器(带 php 的 linux)

python - 单元测试中未调用 load_tests()

python - "gcc"和 "libxml2"出错时如何安装lxml

python - 如何使用其他数据框中的列值选择数据框中的列值

python - 制作全局变量的本地副本

python-2.7 - autodoc 一个扩展模拟类的类

python - 只训练tensorflow中的一些变量

python - Pandas :满足多个条件时的条件计数

python-3.x - python : How to find nth minimum value from a dataframe column?

python - 使用parse_date的infer_datetime_format需要更多时间