python - 代码适用于 x70 列的数据框,而 1000 列的数据框则失败(相同的数据结构)

标签 python dataframe factor-analysis

我使用 alphalens 模块编写了一个因子分析,该模块可以完美地处理数据框中的 70 列,但当我尝试使用 1780 列时会失败......

我不知道这怎么可能,因为它的结构完全相同,我检查了所有内容,但魔法停止在 alphalens 中。

https://github.com/Ibsylonne/test_alphalens

如果您有任何线索或想法,请在下面发表评论。

运行

factor = pd.read_csv('original70columns.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor.head()

date                      
1996-12-31  DU UH Equity      0.0
            SCL LN Equity     0.0
            BMA AR Equity     0.0
            GCLA AR Equity    0.0
            EBS AV Equity     0.0
dtype: float64

不运行

factor = pd.read_csv('test1780columns.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor.head()

date                      
1996-12-31  DU UH Equity      0.0
            SCL LN Equity     0.0
            BMA AR Equity     0.0
            GCLA AR Equity    0.0
            EBS AV Equity     0.0
dtype: float64

对于熟悉 alphalens 的人:(尝试使用 1780 列)

factor_data = get_clean_factor_and_forward_returns(
    factor,
    prices,
    quantiles=2,
    periods=(1, 5, 10,),
    max_loss=1)

TypeError: unsupported operand type(s) for /: 'str' and 'float'

相当神秘...

任何线索、想法,请在下面评论I_I 谢谢

最佳答案

更新,我认为修复了 0 不为 NaN,但我仍然收到错误,仍在查看这个,但也许我到目前为止所做的也会给你一些想法,让我知道什么你认为:

from numpy import nan
from pandas import (DataFrame, date_range)
import pandas as pd
import matplotlib.pyplot as plt

from alphalens.tears import (create_returns_tear_sheet,
                      create_information_tear_sheet,
                      create_turnover_tear_sheet,
                      create_summary_tear_sheet,
                      create_full_tear_sheet,
                      create_event_returns_tear_sheet,
                      create_event_study_tear_sheet)

from alphalens.utils import get_clean_factor_and_forward_returns

# build price
# Added skip for testing, it can be removed
skip=False
prices = pd.read_csv('prices_quant.csv', delimiter=';')
prices['date'] = pd.to_datetime(prices.date)
prices = prices.set_index('date')
prices = prices.fillna(0)
print(prices)


factor = pd.read_csv('test1.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor = factor.fillna(0)
print(factor)

try:
  factor_data = get_clean_factor_and_forward_returns(
     factor,
     prices,
     quantiles=5,
     periods=(1, 5, 10,),
     max_loss=1)
except Exception as e: 
  print(e)
  skip = True
  next 

if skip == False:
  create_full_tear_sheet(factor_data, long_short=True,)
  create_event_returns_tear_sheet(factor_data, prices,long_short=True)
  print("\nNo Errors\n")
else: 
  print("\nWe encountered an error\n")


            DU UH Equity  SCL LN Equity  BMA AR Equity  GCLA AR Equity  EBS AV Equity  OMV AV Equity  ...  RDF SJ Equity  HYP SJ Equity  AEL SJ Equity  MRP SJ Equity  EMI SJ Equity  AXL SJ Equity
date                                                                                                  ...                                                                                          
1996-12-31       0.00000            0.0        9.35256         0.00000        0.00000       11.11470  ...        0.00000        1.25522        1.06860        0.75895        0.00000        0.36700
1997-01-30       0.00000            0.0        9.68044         0.00000        0.00000       11.34016  ...        0.00000        1.25426        1.23754        0.74472        0.00000        0.40870
1997-02-27       0.00000            0.0        9.99271         0.00000        0.00000       11.75658  ...        0.00000        1.21265        1.25000        0.58482        0.00000        0.49981
1997-03-31       0.00000            0.0       11.00760         0.00000        0.00000       11.82128  ...        0.00000        1.27312        1.60597        0.73513        0.00000        0.42375
1997-04-30       0.00000            0.0       10.81243         0.00000        0.00000       10.88544  ...        0.00000        1.24338        1.73112        0.79811        0.00000        0.46649
...                  ...            ...            ...             ...            ...            ...  ...            ...            ...            ...            ...            ...            ...
2018-08-30       1.39123           63.4        4.34430         1.35635       39.73607       52.90799  ...        0.70425        6.94043        1.07509       15.33290        1.08665        0.04015
2018-09-30       1.36945           61.7        4.18581         1.66410       41.55489       56.20015  ...        0.70719        6.51431        1.15394       16.11004        1.05302        0.03670
2018-10-31       1.33406           51.9        4.43810         1.61537       40.70160       55.54638  ...        0.64929        6.11171        1.18551       15.63778        1.00880        0.03318
2018-11-29       1.35312           46.3        4.49952         1.32456       39.43278       50.48753  ...        0.68896        6.40823        1.27159       17.31443        1.06684        0.03305
2018-12-31       1.36956           46.3        4.35219         1.31402       33.23611       43.76183  ...        0.67241        5.66712        1.25163       17.11610        1.02912        0.02990

[265 rows x 1780 columns]
date                      
1996-12-31  DU UH Equity      0.000000
            SCL LN Equity     0.000000
            BMA AR Equity     0.000000
            GCLA AR Equity    0.000000
            EBS AV Equity     0.000000
                                ...   
2018-12-31  HYP SJ Equity     0.029605
            AEL SJ Equity     0.000777
            MRP SJ Equity     0.000000
            EMI SJ Equity     0.000000
            AXL SJ Equity     0.000000
Length: 471700, dtype: float64
unsupported operand type(s) for /: 'int' and 'str'

We encountered an error

请告诉我这是否有帮助

原帖: 有些价格有 NaN,我想知道这是否会导致问题,你知道将它们更改为 0 是否会产生影响。我不确定,但由于它除以字符串,我认为这可能是根本原因,但这只是一个猜测:

$ python3 code_alphalens_analysis 
            DU UH Equity  SCL LN Equity  BMA AR Equity  GCLA AR Equity  EBS AV Equity  OMV AV Equity  ...  RDF SJ Equity  HYP SJ Equity  AEL SJ Equity  MRP SJ Equity  EMI SJ Equity  AXL SJ Equity
date                                                                                                  ...                                                                                          
1996-12-31           NaN            NaN        9.35256             NaN            NaN       11.11470  ...            NaN        1.25522        1.06860        0.75895            NaN        0.36700
1997-01-30           NaN            NaN        9.68044             NaN            NaN       11.34016  ...            NaN        1.25426        1.23754        0.74472            NaN        0.40870
1997-02-27           NaN            NaN        9.99271             NaN            NaN       11.75658  ...            NaN        1.21265        1.25000        0.58482            NaN        0.49981
1997-03-31           NaN            NaN       11.00760             NaN            NaN       11.82128  ...            NaN        1.27312        1.60597        0.73513            NaN        0.42375
1997-04-30           NaN            NaN       10.81243             NaN            NaN       10.88544  ...            NaN        1.24338        1.73112        0.79811            NaN        0.46649
...                  ...            ...            ...             ...            ...            ...  ...            ...            ...            ...            ...            ...            ...
2018-08-30       1.39123           63.4        4.34430         1.35635       39.73607       52.90799  ...        0.70425        6.94043        1.07509       15.33290        1.08665        0.04015
2018-09-30       1.36945           61.7        4.18581         1.66410       41.55489       56.20015  ...        0.70719        6.51431        1.15394       16.11004        1.05302        0.03670
2018-10-31       1.33406           51.9        4.43810         1.61537       40.70160       55.54638  ...        0.64929        6.11171        1.18551       15.63778        1.00880        0.03318
2018-11-29       1.35312           46.3        4.49952         1.32456       39.43278       50.48753  ...        0.68896        6.40823        1.27159       17.31443        1.06684        0.03305
2018-12-31       1.36956           46.3        4.35219         1.31402       33.23611       43.76183  ...        0.67241        5.66712        1.25163       17.11610        1.02912        0.02990

关于python - 代码适用于 x70 列的数据框,而 1000 列的数据框则失败(相同的数据结构),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58782880/

相关文章:

python - 对 pandas 数据框中的数据进行分组和重新排序

r - 使用facana函数时获取因子分析分数

spss - SPSS-在因素分析后使用K-means聚类

python - django-admin-sortable 不保存现有对象的顺序

python - matplotlib:在散点图上方绘制直方图

python - 将字典的 dict 转换为 Pandas DataFrame

python - 将 URL 从数据库传递到 webdriver 时出错

python - 捕获 http 错误

Pandas:按给定顺序对行和列进行透视