python - 在 Python 中寻找最有利可图的多/空对排列 - 一个优化问题？

我有每日利润数据，我正在尝试找到两种 Assets 的最佳组合，以带来最高的利润。我需要购买一种 Assets 做多并做空另一种 Assets ，并在一段时间内找到表现最好的货币对。

我可以通过搜索所有排列来完成此任务，但速度非常慢。 (毫不奇怪)我认为这可能是适合使用 PuLP 这样的库进行线性优化的问题类型。

这是彻底解决问题的示例。我有意保持数据简单，但我有 1000 个 Assets 需要搜索。使用我在下面概述的低效手动方法大约需要 45 分钟才能完成。

注意:由于做多“Alpha”和做空“Bravo”与做多“Bravo”和做空“Alpha”不同，因此我使用的是排列，而不是组合。

编辑:以防有些人不熟悉做多和做空，我试图将最高利润与最低利润配对(使用简而言之，值(value)越负，我赚取的利润就越多)

逻辑将如下所示:

For all the permutations of nodes, add the node one profit to the inverse of the node two profit to get a total profit. Find the pair that has the highest total profit.

这是我非常低效(但有效)的实现:

# Sample data profits = [ ('2019-11-18', 'Alpha', -79.629698), ('2019-11-19', 'Alpha', -17.452517), ('2019-11-20', 'Alpha', -19.069558), ('2019-11-21', 'Alpha', -66.061564), ('2019-11-18', 'Bravo', -87.698670), ('2019-11-19', 'Bravo', -73.812616), ('2019-11-20', 'Bravo', 198.513246), ('2019-11-21', 'Bravo', -69.579466), ('2019-11-18', 'Charlie', 66.302287), ('2019-11-19', 'Charlie', -16.132065), ('2019-11-20', 'Charlie', -123.735898), ('2019-11-21', 'Charlie', -30.046416), ('2019-11-18', 'Delta', -131.682322), ('2019-11-19', 'Delta', 13.296473), ('2019-11-20', 'Delta', 23.595053), ('2019-11-21', 'Delta', 14.103027), ] profits_df = pd.DataFrame(profits, columns=('Date','Node','Profit')).sort_values('Date')

profits_df 看起来像这样:

+----+------------+---------+-------------+ | | Date | Node | Profit | +----+------------+---------+-------------+ | 0 | 2019-11-18 | Alpha | -79.629698 | | 4 | 2019-11-18 | Bravo | -87.698670 | | 8 | 2019-11-18 | Charlie | 66.302287 | | 12 | 2019-11-18 | Delta | -131.682322 | | 1 | 2019-11-19 | Alpha | -17.452517 | +----+------------+---------+-------------+

要手动解决问题，我可以这样做:

date_dfs = [] # I needed a way to take my rows and combine them pairwise, this # is kind of gross but it does work for date, date_df in profits_df.groupby('Date'): tuples = [tuple(x) for x in date_df[['Node', 'Profit']].to_numpy()] pp = list(itertools.permutations(tuples, 2)) flat_pp = [[p[0][0], p[0][1], p[1][0], p[1][1]] for p in pp] df = pd.DataFrame(flat_cc, columns=['Long', 'LP', 'Short', 'SP']) date_dfs.append(df) result_df = pd.concat(daily_dfs) result_df['Pair'] = result_df['Long'] + '/' + result_df['Short'] result_df['Profit'] = result_df['LP'] + result_df['SP'].multiply(-1) result_df.groupby('Pair')['Profit'].sum().sort_values(ascending=False)

通过每天计算所有排列的利润然后将它们相加，我得到以下结果:

+-----------------------------+ | Pair | +-----------------------------+ | Bravo/Alpha 149.635831 | | Delta/Alpha 101.525568 | | Charlie/Alpha 78.601245 | | Bravo/Charlie 71.034586 | | Bravo/Delta 48.110263 | | Delta/Charlie 22.924323 | | Charlie/Delta -22.924323 | | Delta/Bravo -48.110263 | | Charlie/Bravo -71.034586 | | Alpha/Charlie -78.601245 | | Alpha/Delta -101.525568 | | Alpha/Bravo -149.635831 | +-----------------------------+

我确信有一种更有效的方法来解决这个问题。我不明白优化的复杂性，但我对它的了解足以知道这是一个可能的解决方案。我不明白线性优化和非线性优化之间的区别，所以如果我的术语有误，我深表歉意。

有人可以建议我应该尝试的方法吗？

最佳答案

我所做的总结:

根据利润列表创建字典

对每个键值对运行排列

迭代每一对以分别获取名称和金额的组合。

按名称对容器列表进行排序，按名称进行分组，对每个分组的金额进行求和，并将最终结果加载到字典中。

将字典读入数据帧，并按利润降序对值进行排序。

我相信所有处理都应该在进入数据帧之前完成，并且您应该获得显着的加速:

from collections import defaultdict from operator import itemgetter from itertools import permutations, groupby d = defaultdict(list) for k, v,s in profits: d[k].append((v,s)) container = [] for k,v in d.items(): l = (permutations(v,2)) #here I combine the names and the amounts separately into A and B for i,j in l: A = i[0]+'_'+j[0] B = i[-1]+(j[-1]*-1) container.append([A,B]) #here I sort the list, then groupby (groupby wont work if you don't sort first) container = sorted(container, key=itemgetter(0,1)) sam = dict() for name, amount in groupby(container,key=itemgetter(0)): sam[name] = sum(i[-1] for i in amount) outcome = pd.DataFrame .from_dict(sam, orient='index', columns=['Profit']) .sort_values(by='Profit', ascending=False) Profit Bravo_Alpha 149.635831 Delta_Alpha 101.525568 Charlie_Alpha 78.601245 Bravo_Charlie 71.034586 Bravo_Delta 48.110263 Delta_Charlie 22.924323 Charlie_Delta -22.924323 Delta_Bravo -48.110263 Charlie_Bravo -71.034586 Alpha_Charlie -78.601245 Alpha_Delta -101.525568 Alpha_Bravo -149.635831

当我在我的 PC 上运行它时，它是 1.24 毫秒，而 urs 则为 14.1 毫秒。希望有人能更快地生产出东西。

更新:

我为第一个所做的一切都是不必要的。无需排列 - 乘数为 -1。这意味着我们需要做的就是获取每个名称的总和，将名称配对(不重复)，将其中一个值乘以 -1 并添加到另一个值，然后当我们得到一对名称的总和时，乘以 -再次1即可得到相反的结果。我得到的速度约为 18.6μs，引入 pandas 后速度可达 273μs。这是一些显着的加速。大部分计算都是将数据读入 pandas 中。这里是:

from collections import defaultdict from operator import itemgetter from itertools import combinations, chain import pandas as pd def optimizer(profits): nw = defaultdict(list) content = dict() [nw[node].append((profit)) for dat,node,profit in profits] #sum the total for each key B = {key : sum(value) for key ,value in nw.items()} #multiply the value of the second item in the tuple by -1 #add that to the value of the first item in the tuple #pair the result back to the tuple and form a dict sumr = {(first,last):sum((B[first],B[last]*-1)) for first,last in combinations(B.keys(),2)} #reverse the positions in the tuple for each key #multiply the value by -1 and pair to form a dict rev = {tuple(reversed(k)): v*-1 for k,v in sumr.items()} #join the two dictionaries into one #sort in descending order #and create a dictionary result = dict(sorted(chain(sumr.items(), rev.items() ), key = itemgetter(-1), reverse=True )) #load into pandas #trying to reduce the compute time here by reducing pandas workload return pd.DataFrame(list(result.values()), index = list(result.keys()), )

我可能会延迟读取数据帧，直到它不可避免为止。我很想知道你运行它时的实际速度是多少。

关于python - 在 Python 中寻找最有利可图的多/空对排列 - 一个优化问题？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59870886/

python - 在 Python 中寻找最有利可图的多/空对排列 - 一个优化问题？

上一篇：python - 如何用两种颜色绘制imshow？

下一篇：python - 将 pandas DataFrame 行保留在 df2 中，并为 df1 中的每一行保留 timedelta