我正在尝试使用Python中的矢量化方法来模拟金融交易。其中一部分包括删除重复信号。
为了详细说明,我开发了一个 buy_signal
列和一个 sell_signal
列。这些列包含 1 和 0 形式的 bool 值。
从自上而下的角度来看信号,我不想在 sell_signal
触发之前触发第二个 buy_signal
,即“持仓”未平仓时触发。与卖出信号相同,如果“仓位”平仓,我不希望出现重复的卖出信号。如果sell_signal
和buy_signal
为1
,则将它们都设置为0。
消除这些不相关信号的最佳方法是什么?
这是一个例子:
import pandas as pd
df = pd.DataFrame(
{
"buy_signal": [1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
"sell_signal": [0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
}
)
print(df)
buy_signal sell_signal
0 1 0
1 1 0
2 1 1
3 1 1
4 0 1
5 0 0
6 1 0
7 1 0
8 1 1
9 0 0
这是我想要的结果:
buy_signal sell_signal
0 1 0
1 0 0
2 0 1
3 0 0
4 0 0
5 0 0
6 1 0
7 0 0
8 0 1
9 0 0
最佳答案
正如我之前所说(在关于此后删除的回复的评论中),人们必须考虑买入和卖出信号之间的相互作用,而不能简单地独立操作每个信号。
关键思想是考虑一个数量q
(或“头寸”),即当前持有的数量,并且OP表示希望限制在[0, 1]
。该数量是清洗后的cumsum(buy - sell)
。
因此,问题简化为“有限制的累积和”,不幸的是,这不能用 numpy
或 pandas
以矢量化方式完成,但我们可以编写相当多的代码有效地使用numba
。下面的代码在 37 毫秒内处理 100 万行。
import numpy as np
from numba import njit
@njit
def cumsum_clip(a, xmin=-np.inf, xmax=np.inf):
res = np.empty_like(a)
c = 0
for i in range(len(a)):
c = min(max(c + a[i], xmin), xmax)
res[i] = c
return res
def clean_buy_sell(df, xmin=0, xmax=1):
# model the quantity held: cumulative sum of buy-sell clipped in
# [xmin, xmax]
# note that, when buy and sell are equal, there is no change
q = cumsum_clip(
(df['buy_signal'] - df['sell_signal']).values,
xmin=xmin, xmax=xmax)
# derive actual transactions: positive for buy, negative for sell, 0 for hold
trans = np.diff(np.r_[0, q])
df = df.assign(
buy_signal=np.clip(trans, 0, None),
sell_signal=np.clip(-trans, 0, None),
)
return df
现在:
df = pd.DataFrame(
{
"buy_signal": [1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
"sell_signal": [0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
}
)
new_df = clean_buy_sell(df)
>>> new_df
buy_signal sell_signal
0 1 0
1 0 0
2 0 0
3 0 0
4 0 1
5 0 0
6 1 0
7 0 0
8 0 0
9 0 0
速度和正确性
n = 1_000_000
np.random.seed(0) # repeatable example
df = pd.DataFrame(np.random.choice([0, 1], (n, 2)),
columns=['buy_signal', 'sell_signal'])
%timeit clean_buy_sell(df)
37.3 ms ± 104 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
正确性测试:
z = clean_buy_sell(df)
q = (z['buy_signal'] - z['sell_signal']).cumsum()
# q is quantity held through time; must be in {0, 1}
assert q.isin({0, 1}).all()
# we should not have introduced any new buy signal:
# check that any buy == 1 in z was also 1 in df
assert not (z['buy_signal'] & ~df['buy_signal']).any()
# same for sell signal:
assert not (z['sell_signal'] & ~df['sell_signal']).any()
# finally, buy and sell should never be 1 on the same row:
assert not (z['buy_signal'] & z['sell_signal']).any()
奖励:其他限制、部分买卖
为了好玩,我们可以考虑更一般的情况,其中买入和卖出值是小数(或任何浮点值),并且限制不是[0, 1]
。当前版本的 clean_buy_sell
无需进行任何更改,它足够通用来处理这些情况。
np.random.seed(0)
df = pd.DataFrame(
np.random.uniform(0, 1, (100, 2)),
columns=['buy_signal', 'sell_signal'],
)
# set limits to -1, 2: we can sell short (borrow) up to 1 unit
# and own up to 2 units.
z = clean_buy_sell(df, -1, 2)
(z['buy_signal'] - z['sell_signal']).cumsum().plot()
关于Python Pandas : Vectorized Way of Cleaning Buy and Sell Signals,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75349276/