我正在尝试传递来自 Python
的时间序列数据至q/kdb+
。
一个解决方案是 qPython
module ,提供从 q
的无缝转换Pandas 的表/字典。
问题是当尝试将从 Pandas 传递到 q
时,DataFrame
中的时间索引(在 Date
列中)并没有完全进入 q
边。可重现的代码:
import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5] # explore first 5 rows of the DataFrame
# Out:
# Open High Low Close Volume Adj Close
# Date
# 2010-01-04 10.17 10.28 10.05 10.28 60855800 9.43
# 2010-01-05 10.45 11.24 10.40 10.96 215620200 10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600 10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700 10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000 10.72
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server
# Out:
# Open High Low Close Volume Adj Close
# 0 10.17 10.28 10.05 10.28 60855800 9.43
# 1 10.45 11.24 10.40 10.96 215620200 10.05
# 2 11.21 11.46 11.13 11.37 200070600 10.43
# 3 11.46 11.69 11.32 11.66 130201700 10.69
# 4 11.67 11.74 11.46 11.69 130463000 10.72
如您所见,q 表没有 Date
f
中存在的列DataFrame 作为索引。
如何有效(对于大数据)将日期时间索引传递给 q?
最佳答案
序列化 DataFrame
对象时,qPython
检查 meta
属性是否存在。如果该属性不存在,DataFrame
将被序列化为 q 表,并且在此过程中跳过索引列。如果您想保留索引列,则必须设置meta
属性并提供类型提示以强制表示q键控表。
请看一下修改后的示例:
import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython
from qpython import MetaData
from qpython.qtype import QKEYED_TABLE
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5] # explore first 5 rows of the DataFrame
# Out:
# Open High Low Close Volume Adj Close
# Date
# 2010-01-04 10.17 10.28 10.05 10.28 60855800 9.43
# 2010-01-05 10.45 11.24 10.40 10.96 215620200 10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600 10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700 10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000 10.72
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
f.meta = MetaData(**{'qtype': QKEYED_TABLE}) # enforce to serialize DataFrame as keyed table
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server
# Out:
# Open High Low Close Volume Adj Close
# Date
# 2010-01-04 10.17 10.28 10.05 10.28 60855800 9.43
# 2010-01-05 10.45 11.24 10.40 10.96 215620200 10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600 10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700 10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000 10.72
关于python - pandas DataFrame 在传递到 kdb+ 时会删除索引(使用 qPython API),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28385137/