python - pandas DataFrame 在传递到 kdb+ 时会删除索引(使用 qPython API)

标签 python pandas kdb q-lang exxeleron-q

我正在尝试传递来自 Python 的时间序列数据至q/kdb+

一个解决方案是 qPython module ,提供从 q 的无缝转换Pandas 的表/字典。

问题是当尝试将 Pandas 传递到 q 时,DataFrame中的时间索引(在 Date 列中)并没有完全进入 q边。可重现的代码:

import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5]  # explore first 5 rows of the DataFrame
# Out:
#             Open  High  Low  Close    Volume  Adj Close
#    Date
# 2010-01-04 10.17 10.28 10.05 10.28  60855800       9.43 
# 2010-01-05 10.45 11.24 10.40 10.96 215620200      10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600      10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700      10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000      10.72

q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server 
# Out:
#    Open  High  Low  Close    Volume  Adj Close
# 0 10.17 10.28 10.05 10.28  60855800       9.43 
# 1 10.45 11.24 10.40 10.96 215620200      10.05
# 2 11.21 11.46 11.13 11.37 200070600      10.43
# 3 11.46 11.69 11.32 11.66 130201700      10.69
# 4 11.67 11.74 11.46 11.69 130463000      10.72

如您所见,q 表没有 Date f 中存在的列DataFrame 作为索引。

如何有效(对于大数据)将日期时间索引传递给 q?

最佳答案

序列化 DataFrame 对象时,qPython 检查 meta 属性是否存在。如果该属性不存在,DataFrame 将被序列化为 q 表,并且在此过程中跳过索引列。如果您想保留索引列,则必须设置meta属性并提供类型提示以强制表示q键控表。

请看一下修改后的示例:

import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython

from qpython import MetaData
from qpython.qtype import QKEYED_TABLE


start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5]  # explore first 5 rows of the DataFrame
# Out:
#             Open  High  Low  Close    Volume  Adj Close
#    Date
# 2010-01-04 10.17 10.28 10.05 10.28  60855800       9.43 
# 2010-01-05 10.45 11.24 10.40 10.96 215620200      10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600      10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700      10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000      10.72

q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
f.meta = MetaData(**{'qtype': QKEYED_TABLE}) # enforce to serialize DataFrame as keyed table
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server 
# Out:
#              Open   High    Low  Close     Volume  Adj Close
# Date                                                         
# 2010-01-04  10.17  10.28  10.05  10.28   60855800       9.43
# 2010-01-05  10.45  11.24  10.40  10.96  215620200      10.05
# 2010-01-06  11.21  11.46  11.13  11.37  200070600      10.43
# 2010-01-07  11.46  11.69  11.32  11.66  130201700      10.69
# 2010-01-08  11.67  11.74  11.46  11.69  130463000      10.72

关于python - pandas DataFrame 在传递到 kdb+ 时会删除索引(使用 qPython API),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28385137/

相关文章:

dictionary - KDB:从字典中就地删除

python - 更新/合并和更新列 pandas 的子集

python - 对列表中相同元素的索引进行分组的有效方法

kdb:按 2 个键进行制表,包括计数为 0 的组合

python - 同时循环三个列表 : nested loop not working

Python 2.7 : Pandas datetime does not work for future dates?

kdb - 在kdb中表示树结构

python - 在函数末尾设置调试器断点而不返回

python - 使用 gzip 启用 KMZ 输出

python - 从页面上的相对 url 重构绝对 url