我正在尝试使用 apply 来评估赫芬达尔指数。我通过将数据帧转换为 numpy 矩阵来完成此操作。事实上,函数 evalHerfindahlIndex 运行良好,它评估每行 Herfindahl 索引的正确值。但是,当我尝试使用相同的函数(evalHerfindahlIndexForDF)来使用 apply 时,我收到一个非常奇怪的错误:
ValueError: ("No axis named 1 for object type <class 'pandas.core.series.Series'>", 'occurred at index A')
整个代码是这样的:
import pandas as pd
import numpy as np
import datetime
def evalHerfindahlIndex(x):
soma=np.sum(x,axis=1)
y=np.empty(np.shape(x))
for line in range(len(soma)):
y[line,:]=np.power(x[line,:]/soma[line],2.0)
hhi=np.sum(y,axis=1)
return hhi
def evalHerfindahlIndexForDF(x):
soma=x.sum(axis=1)
def creatingDataFrame():
dateList=[]
dateList.append(datetime.date(2002,1,1))
dateList.append(datetime.date(2002,2,1))
dateList.append(datetime.date(2002,1,1))
dateList.append(datetime.date(2002,1,1))
dateList.append(datetime.date(2002,2,1))
raw_data = {'Date': dateList,
'Company': ['A', 'B', 'B', 'C' , 'C'],
'var1': [10, 20, 30, 40 , 50]}
df = pd.DataFrame(raw_data, columns = ['Date','Company', 'var1'])
df.loc[1, 'var1'] = np.nan
return df
if __name__=="__main__":
df=creatingDataFrame()
print(df)
dfPivot=df.pivot(index='Date', columns='Company', values='var1')
#print(dfPivot)
dfPivot=dfPivot.fillna(0)
dfPivot['Date']=dfPivot.index
listOfCompanies=list(set(df['Company']))
Pivot=dfPivot.as_matrix(columns=listOfCompanies)
print(evalHerfindahlIndex(Pivot))
print(dfPivot)
print(dfPivot[listOfCompanies].apply(evalHerfindahlIndexForDF))
我使用的数据框是 dfPivot:
Company A B C Date
Date
2002-01-01 10.0 30.0 40.0 2002-01-01
2002-02-01 0.0 0.0 50.0 2002-02-01
使用 evalHerfindahlIndex 评估的 Herfindahl 指数的正确值为:
[0.40625 1. ]
我想将其作为数据框 dfPivot 的额外列返回。
最佳答案
考虑更新您的方法,然后通过将数组返回专门转换为 pandas Series 来更新您的调用
def evalHerfindahlIndex(df):
x = df.as_matrix(columns = listOfCompanies) # MOVE MATRIX OPERATION WITHIN FCT
soma = np.sum(x,axis = 1)
y = np.empty(np.shape(x))
for line in range(len(soma)):
y[line,:] = np.power(x[line,:]/soma[line],2.0)
hhi = pd.Series(np.sum(y,axis = 1)) # CONVERT TO SERIES
return hhi
...
if __name__=="__main__":
df = creatingDataFrame()
print(df)
dfPivot = df.pivot(index = 'Date', columns = 'Company', values = 'var1')
#print(dfPivot)
dfPivot = dfPivot.fillna(0)
dfPivot['Date'] = dfPivot.index
# ASSIGN SERIES VALUES (.values to IGNORE INDEX)
dfPivot['HE_Result'] = evalHerfindahlIndex(dfPivot).values
# OUTPUT
print(evalHerfindahlIndex(dfPivot))
# 0 0.40625
# 1 1.00000
# dtype: float64
print(dfPivot)
# Company A B C Date HE_Result
# Date
# 2002-01-01 10.0 30.0 40.0 2002-01-01 0.40625
# 2002-02-01 0.0 0.0 50.0 2002-02-01 1.00000
关于python - 如何使用 apply 来实现这个功能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51526805/