我使用 alphalens 模块编写了一个因子分析,该模块可以完美地处理数据框中的 70 列,但当我尝试使用 1780 列时会失败......
我不知道这怎么可能,因为它的结构完全相同,我检查了所有内容,但魔法停止在 alphalens 中。
https://github.com/Ibsylonne/test_alphalens
如果您有任何线索或想法,请在下面发表评论。
运行
factor = pd.read_csv('original70columns.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor.head()
date
1996-12-31 DU UH Equity 0.0
SCL LN Equity 0.0
BMA AR Equity 0.0
GCLA AR Equity 0.0
EBS AV Equity 0.0
dtype: float64
不运行
factor = pd.read_csv('test1780columns.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor.head()
date
1996-12-31 DU UH Equity 0.0
SCL LN Equity 0.0
BMA AR Equity 0.0
GCLA AR Equity 0.0
EBS AV Equity 0.0
dtype: float64
对于熟悉 alphalens 的人:(尝试使用 1780 列)
factor_data = get_clean_factor_and_forward_returns(
factor,
prices,
quantiles=2,
periods=(1, 5, 10,),
max_loss=1)
TypeError: unsupported operand type(s) for /: 'str' and 'float'
相当神秘...
任何线索、想法,请在下面评论I_I 谢谢
最佳答案
更新,我认为修复了 0 不为 NaN,但我仍然收到错误,仍在查看这个,但也许我到目前为止所做的也会给你一些想法,让我知道什么你认为:
from numpy import nan
from pandas import (DataFrame, date_range)
import pandas as pd
import matplotlib.pyplot as plt
from alphalens.tears import (create_returns_tear_sheet,
create_information_tear_sheet,
create_turnover_tear_sheet,
create_summary_tear_sheet,
create_full_tear_sheet,
create_event_returns_tear_sheet,
create_event_study_tear_sheet)
from alphalens.utils import get_clean_factor_and_forward_returns
# build price
# Added skip for testing, it can be removed
skip=False
prices = pd.read_csv('prices_quant.csv', delimiter=';')
prices['date'] = pd.to_datetime(prices.date)
prices = prices.set_index('date')
prices = prices.fillna(0)
print(prices)
factor = pd.read_csv('test1.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor = factor.fillna(0)
print(factor)
try:
factor_data = get_clean_factor_and_forward_returns(
factor,
prices,
quantiles=5,
periods=(1, 5, 10,),
max_loss=1)
except Exception as e:
print(e)
skip = True
next
if skip == False:
create_full_tear_sheet(factor_data, long_short=True,)
create_event_returns_tear_sheet(factor_data, prices,long_short=True)
print("\nNo Errors\n")
else:
print("\nWe encountered an error\n")
DU UH Equity SCL LN Equity BMA AR Equity GCLA AR Equity EBS AV Equity OMV AV Equity ... RDF SJ Equity HYP SJ Equity AEL SJ Equity MRP SJ Equity EMI SJ Equity AXL SJ Equity
date ...
1996-12-31 0.00000 0.0 9.35256 0.00000 0.00000 11.11470 ... 0.00000 1.25522 1.06860 0.75895 0.00000 0.36700
1997-01-30 0.00000 0.0 9.68044 0.00000 0.00000 11.34016 ... 0.00000 1.25426 1.23754 0.74472 0.00000 0.40870
1997-02-27 0.00000 0.0 9.99271 0.00000 0.00000 11.75658 ... 0.00000 1.21265 1.25000 0.58482 0.00000 0.49981
1997-03-31 0.00000 0.0 11.00760 0.00000 0.00000 11.82128 ... 0.00000 1.27312 1.60597 0.73513 0.00000 0.42375
1997-04-30 0.00000 0.0 10.81243 0.00000 0.00000 10.88544 ... 0.00000 1.24338 1.73112 0.79811 0.00000 0.46649
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-08-30 1.39123 63.4 4.34430 1.35635 39.73607 52.90799 ... 0.70425 6.94043 1.07509 15.33290 1.08665 0.04015
2018-09-30 1.36945 61.7 4.18581 1.66410 41.55489 56.20015 ... 0.70719 6.51431 1.15394 16.11004 1.05302 0.03670
2018-10-31 1.33406 51.9 4.43810 1.61537 40.70160 55.54638 ... 0.64929 6.11171 1.18551 15.63778 1.00880 0.03318
2018-11-29 1.35312 46.3 4.49952 1.32456 39.43278 50.48753 ... 0.68896 6.40823 1.27159 17.31443 1.06684 0.03305
2018-12-31 1.36956 46.3 4.35219 1.31402 33.23611 43.76183 ... 0.67241 5.66712 1.25163 17.11610 1.02912 0.02990
[265 rows x 1780 columns]
date
1996-12-31 DU UH Equity 0.000000
SCL LN Equity 0.000000
BMA AR Equity 0.000000
GCLA AR Equity 0.000000
EBS AV Equity 0.000000
...
2018-12-31 HYP SJ Equity 0.029605
AEL SJ Equity 0.000777
MRP SJ Equity 0.000000
EMI SJ Equity 0.000000
AXL SJ Equity 0.000000
Length: 471700, dtype: float64
unsupported operand type(s) for /: 'int' and 'str'
We encountered an error
请告诉我这是否有帮助
原帖: 有些价格有 NaN,我想知道这是否会导致问题,你知道将它们更改为 0 是否会产生影响。我不确定,但由于它除以字符串,我认为这可能是根本原因,但这只是一个猜测:
$ python3 code_alphalens_analysis
DU UH Equity SCL LN Equity BMA AR Equity GCLA AR Equity EBS AV Equity OMV AV Equity ... RDF SJ Equity HYP SJ Equity AEL SJ Equity MRP SJ Equity EMI SJ Equity AXL SJ Equity
date ...
1996-12-31 NaN NaN 9.35256 NaN NaN 11.11470 ... NaN 1.25522 1.06860 0.75895 NaN 0.36700
1997-01-30 NaN NaN 9.68044 NaN NaN 11.34016 ... NaN 1.25426 1.23754 0.74472 NaN 0.40870
1997-02-27 NaN NaN 9.99271 NaN NaN 11.75658 ... NaN 1.21265 1.25000 0.58482 NaN 0.49981
1997-03-31 NaN NaN 11.00760 NaN NaN 11.82128 ... NaN 1.27312 1.60597 0.73513 NaN 0.42375
1997-04-30 NaN NaN 10.81243 NaN NaN 10.88544 ... NaN 1.24338 1.73112 0.79811 NaN 0.46649
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-08-30 1.39123 63.4 4.34430 1.35635 39.73607 52.90799 ... 0.70425 6.94043 1.07509 15.33290 1.08665 0.04015
2018-09-30 1.36945 61.7 4.18581 1.66410 41.55489 56.20015 ... 0.70719 6.51431 1.15394 16.11004 1.05302 0.03670
2018-10-31 1.33406 51.9 4.43810 1.61537 40.70160 55.54638 ... 0.64929 6.11171 1.18551 15.63778 1.00880 0.03318
2018-11-29 1.35312 46.3 4.49952 1.32456 39.43278 50.48753 ... 0.68896 6.40823 1.27159 17.31443 1.06684 0.03305
2018-12-31 1.36956 46.3 4.35219 1.31402 33.23611 43.76183 ... 0.67241 5.66712 1.25163 17.11610 1.02912 0.02990
关于python - 代码适用于 x70 列的数据框,而 1000 列的数据框则失败(相同的数据结构),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58782880/