我有以下多索引时间序列数据。
first 001 \
second open high low close jdiff_vol value
date time
20150721 90100 2082.18 2082.18 2082.18 2082.18 11970 99466
90200 2082.72 2083.01 2082.18 2083.01 4886 40108
90300 2083.68 2084.20 2083.68 2083.98 6966 48847
90400 2083.63 2084.21 2083.63 2084.00 6817 48020
90500 2084.03 2084.71 2083.91 2084.32 10193 58399
20150721 90100 2084.14 2084.22 2083.59 2083.65 7860 39128
90200 2084.08 2084.08 2083.47 2083.50 7171 39147
90300 2083.25 2083.65 2083.08 2083.60 4549 34373
90400 2084.06 2084.06 2083.66 2083.80 6980 38088
90500 2083.61 2084.04 2083.27 2083.89 5292 33466
下面的代码有效。
opens = data.xs('open', level='second', axis=1, drop_level=True)
但是,使用以下代码选择多列失败。
opens = data.xs(('open','close'), level='second', axis=1, drop_level=True)
如何修改它以选择多列?
最佳答案
直到现在,还无法在同一级别使用具有两个列键的 pandas xs() 函数。只能使用来自不同级别的两个 key :
opens = data.xs(('001','close'), level=('first','second'), axis=1, drop_level=True)
但是,这并不是您要找的。另一种解决方案是执行两个 pandas xs() 函数,然后concat它们:
df_xs = pd.concat([df.xs('open', level='second', axis=1, drop_level=True), df.xs('close', level='second', axis=1, drop_level=True)])
这是一个完整的例子。首先,创建一个数据框:
import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])
print(df)
first bar baz foo
second one two one two one two
first second
bar one 0.699065 -0.283550 0.072595 -0.699627 0.879832 -1.787520
two -1.172970 1.381607 1.941370 0.577451 -0.182819 0.215879
baz one 0.669402 -0.018534 0.775114 1.277079 0.404116 -2.450712
two 0.066530 -0.509366 1.249981 2.426217 0.409881 -0.178713
foo one 1.098217 0.399427 -1.423057 -1.261542 1.668202 0.187629
two 0.827283 0.974239 -1.944796 0.266321 0.700679 -0.371074
然后你可以用一个concat执行xs():
df_xs = pd.concat([df.xs('one', level='second', axis=1, drop_level=True), df.xs('two', level='second', axis=1, drop_level=True)])
print (df_xs)
first bar baz foo
first second
bar one 0.699065 0.072595 0.879832
two -1.172970 1.941370 -0.182819
baz one 0.669402 0.775114 0.404116
two 0.066530 1.249981 0.409881
foo one 1.098217 -1.423057 1.668202
two 0.827283 -1.944796 0.700679
bar one -0.283550 -0.699627 -1.787520
two 1.381607 0.577451 0.215879
baz one -0.018534 1.277079 -2.450712
two -0.509366 2.426217 -0.178713
foo one 0.399427 -1.261542 0.187629
two 0.974239 0.266321 -0.371074
关于python - pandas xs 函数的多列选择失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36521388/