我有一个名为passenger_details的数据框,如下所示
Passenger Age Gender Commute_to_work Commute_mode Commute_time ...
Passenger1 32 Male I drive to work car 1 hour
Passenger2 26 Female I take the metro train NaN ...
Passenger3 33 Female NaN NaN 30 mins ...
Passenger4 29 Female I take the metro train NaN ...
...
我想对其中包含字符串“Commute”的列标题应用 if 函数,将缺失值(NaN 值)转换为 0,并将当前值转换为 1。
这基本上就是我想要实现的目标
Passenger Age Gender Commute_to_work Commute_mode Commute_time ...
Passenger1 32 Male 1 1 1
Passenger2 26 Female 1 1 0 ...
Passenger3 33 Female 0 0 1 ...
Passenger4 29 Female 1 1 0 ...
...
但是,我在如何表达我的代码方面遇到了困难。这就是我所做的
passenger_details = passenger_details.filter(regex = 'Location_', axis = 1).apply(lambda value: str(value).replace('value', '1', 'NaN','0'))
但我收到类型错误
'replace() takes at most 3 arguments (4 given)'
如有任何帮助,我们将不胜感激
最佳答案
按Index.contains
选择列并通过 DataFrame.notna
测试不缺失值最后将 True/False
转换为整数到 1/0
映射:
c = df.columns.str.contains('Commute')
df.loc[:, c] = df.loc[:, c].notna().astype(int)
print (df)
Passenger Age Gender Commute_to_work Commute_mode Commute_time
0 Passenger1 32 Male 1 1 1
1 Passenger2 26 Female 1 1 0
2 Passenger3 33 Female 0 0 1
3 Passenger4 29 Female 1 1 0
关于python - 将函数应用于数据框中列标题包含特定字符串的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55043537/