我正在尝试创建一个数据框,其中一个字段是使用函数计算的。为此,我使用以下代码:
import pandas as pd
def didSurvive(sex):
return int(sex == "female")
titanic_df = pd.read_csv("test.csv")
submission = pd.DataFrame({
"PassengerId": titanic_df["PassengerId"],
"Survived": didSurvive(titanic_df["Sex"])
})
submission.to_csv('titanic-predictions.csv', index=False)
当我运行此代码时,出现以下错误:
D:\Documents\kaggle\titanic>python predictor.py
File "predictor.py", line 3
def didSurvive() {
^
SyntaxError: invalid syntax
D:\Documents\kaggle\titanic>python predictor.py
D:\Documents\kaggle\titanic>python predictor.py
D:\Documents\kaggle\titanic>python predictor.py
Traceback (most recent call last):
File "predictor.py", line 10, in
"Survived": didSurvive(titanic_df["Sex"])
File "predictor.py", line 4, in didSurvive
return int(sex == "female")
File "C:\Python34\lib\site-packages\pandas\core\series.py", line 92, in wrapper
"{0}".format(str(converter)))
TypeError: cannot convert the series to
D:\Documents\kaggle\titanic>
我认为正在发生的事情是我试图在一系列 bool 值而不是单个 bool 值上运行 int() 。我该如何解决这个问题?
最佳答案
要转换Series的数据类型,您可以使用astype()
函数,这应该可以:
def didSurvive(sex):
return (sex == "female").astype(int)
关于Python Pandas : creating a dataframe using a function for one of the fields,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41081882/