Python Pandas : Vectorized Way of Cleaning Buy and Sell Signals

我正在尝试使用Python中的矢量化方法来模拟金融交易。其中一部分包括删除重复信号。

为了详细说明，我开发了一个 buy_signal 列和一个 sell_signal 列。这些列包含 1 和 0 形式的 bool 值。

从自上而下的角度来看信号，我不想在 sell_signal 触发之前触发第二个 buy_signal，即“持仓”未平仓时触发。与卖出信号相同，如果“仓位”平仓，我不希望出现重复的卖出信号。如果sell_signal和buy_signal为1，则将它们都设置为0。

消除这些不相关信号的最佳方法是什么？

这是一个例子:

import pandas as pd

df = pd.DataFrame(
    {
        "buy_signal": [1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
        "sell_signal": [0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
    }
)
print(df)

   buy_signal  sell_signal
0           1            0
1           1            0
2           1            1
3           1            1
4           0            1
5           0            0
6           1            0
7           1            0
8           1            1
9           0            0

这是我想要的结果:

   buy_signal  sell_signal
0           1            0
1           0            0
2           0            1
3           0            0
4           0            0
5           0            0
6           1            0
7           0            0
8           0            1
9           0            0

最佳答案

正如我之前所说(在关于此后删除的回复的评论中)，人们必须考虑买入和卖出信号之间的相互作用，而不能简单地独立操作每个信号。

关键思想是考虑一个数量q(或“头寸”)，即当前持有的数量，并且OP表示希望限制在[0, 1]。该数量是清洗后的cumsum(buy - sell)。

因此，问题简化为“有限制的累积和”，不幸的是，这不能用 numpy 或 pandas 以矢量化方式完成，但我们可以编写相当多的代码有效地使用numba。下面的代码在 37 毫秒内处理 100 万行。

import numpy as np
from numba import njit

@njit
def cumsum_clip(a, xmin=-np.inf, xmax=np.inf):
    res = np.empty_like(a)
    c = 0
    for i in range(len(a)):
        c = min(max(c + a[i], xmin), xmax)
        res[i] = c
    return res


def clean_buy_sell(df, xmin=0, xmax=1):
    # model the quantity held: cumulative sum of buy-sell clipped in
    # [xmin, xmax]
    # note that, when buy and sell are equal, there is no change
    q = cumsum_clip(
        (df['buy_signal'] - df['sell_signal']).values,
        xmin=xmin, xmax=xmax)

    # derive actual transactions: positive for buy, negative for sell, 0 for hold
    trans = np.diff(np.r_[0, q])
    df = df.assign(
        buy_signal=np.clip(trans, 0, None),
        sell_signal=np.clip(-trans, 0, None),
    )
    
    return df

现在:

df = pd.DataFrame(
    {
        "buy_signal": [1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
        "sell_signal": [0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
    }
)
new_df = clean_buy_sell(df)

>>> new_df
   buy_signal  sell_signal
0           1            0
1           0            0
2           0            0
3           0            0
4           0            1
5           0            0
6           1            0
7           0            0
8           0            0
9           0            0

速度和正确性

n = 1_000_000

np.random.seed(0)  # repeatable example
df = pd.DataFrame(np.random.choice([0, 1], (n, 2)),
                  columns=['buy_signal', 'sell_signal'])

%timeit clean_buy_sell(df)
37.3 ms ± 104 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

正确性测试:

z = clean_buy_sell(df)
q = (z['buy_signal'] - z['sell_signal']).cumsum()

# q is quantity held through time; must be in {0, 1}
assert q.isin({0, 1}).all()

# we should not have introduced any new buy signal:
# check that any buy == 1 in z was also 1 in df

assert not (z['buy_signal'] & ~df['buy_signal']).any()

# same for sell signal:
assert not (z['sell_signal'] & ~df['sell_signal']).any()

# finally, buy and sell should never be 1 on the same row:
assert not (z['buy_signal'] & z['sell_signal']).any()

奖励:其他限制、部分买卖

为了好玩，我们可以考虑更一般的情况，其中买入和卖出值是小数(或任何浮点值)，并且限制不是[0, 1]。当前版本的 clean_buy_sell 无需进行任何更改，它足够通用来处理这些情况。

np.random.seed(0)
df = pd.DataFrame(
    np.random.uniform(0, 1, (100, 2)),
    columns=['buy_signal', 'sell_signal'],
)

# set limits to -1, 2: we can sell short (borrow) up to 1 unit
# and own up to 2 units.
z = clean_buy_sell(df, -1, 2)

(z['buy_signal'] - z['sell_signal']).cumsum().plot()

关于Python Pandas : Vectorized Way of Cleaning Buy and Sell Signals，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/75349276/

Python Pandas : Vectorized Way of Cleaning Buy and Sell Signals

速度和正确性

奖励:其他限制、部分买卖

上一篇：python - 来自isoformat : argument must be str Django

下一篇：c++ - 在 C++ 和 range-v3 中，如何将空格分隔的数字字符串转换为整数 vector ？