python - 根据其他两列的字符串创建 Pandas 数据框列

我有一个如下所示的数据框:

boat_type   boat_type_2
Not Known   Not Known
Not Known   kayak
ship        Not Known
Not Known   Not Known
ship        Not Known

我想创建第三列 boat_type_final 应该如下所示:

boat_type   boat_type_2  boat_type_final
Not Known   Not Known    cruise
Not Known   kayak        kayak
ship        Not Known    ship  
Not Known   Not Known    cruise
ship        Not Known    ship

所以基本上，如果 boat_type 和 boat_type_2 中都存在“Not Known”，那么值应该是“cruise”。但是，如果前两列中有除“Not Known”以外的字符串，则 boat_type_final 应填充该字符串，“kayak”或“ship”。

执行此操作最优雅的方法是什么？我见过几个选项，例如 where、创建函数和/或逻辑，我想知道真正的 pythonista 会做什么。

到目前为止，这是我的代码:

import pandas as pd
import numpy as np
data = [{'boat_type': 'Not Known', 'boat_type_2': 'Not Known'},
    {'boat_type': 'Not Known',  'boat_type_2': 'kayak'},
    {'boat_type': 'ship',  'boat_type_2': 'Not Known'},
    {'boat_type': 'Not Known',  'boat_type_2': 'Not Known'},
    {'boat_type': 'ship',  'boat_type_2': 'Not Known'}]
df = pd.DataFrame(data
df['phone_type_final'] = np.where(df.phone_type.str.contains('Not'))...

最佳答案

使用:

df['boat_type_final'] = (df.replace('Not Known',np.nan)
                           .ffill(axis=1)
                           .iloc[:, -1]
                           .fillna('cruise'))
print (df)
   boat_type boat_type_2 boat_type_final
0  Not Known   Not Known          cruise
1  Not Known       kayak           kayak
2       ship   Not Known            ship
3  Not Known   Not Known          cruise
4       ship   Not Known            ship

解释:

第一个replace Not Known 缺失值:

print (df.replace('Not Known',np.nan))
  boat_type boat_type_2
0       NaN         NaN
1       NaN       kayak
2      ship         NaN
3       NaN         NaN
4      ship         NaN

然后通过每行前向填充替换NaN:

print (df.replace('Not Known',np.nan).ffill(axis=1))
  boat_type boat_type_2
0       NaN         NaN
1       NaN       kayak
2      ship        ship
3       NaN         NaN
4      ship        ship

按位置选择最后一列 iloc :

print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1])
0      NaN
1    kayak
2     ship
3      NaN
4     ship
Name: boat_type_2, dtype: object

如果可能，NaN 添加 fillna :

print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1].fillna('cruise'))
0    cruise
1     kayak
2      ship
3    cruise
4      ship
Name: boat_type_2, dtype: object

如果只有几列，另一种解决方案是使用 numpy.select :

m1 = df['boat_type'] == 'ship'
m2 = df['boat_type_2'] == 'kayak'

df['boat_type_final'] = np.select([m1, m2], ['ship','kayak'], default='cruise')
print (df)
   boat_type boat_type_2 boat_type_final
0  Not Known   Not Known          cruise
1  Not Known       kayak           kayak
2       ship   Not Known            ship
3  Not Known   Not Known          cruise
4       ship   Not Known            ship

关于python - 根据其他两列的字符串创建 Pandas 数据框列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51515724/

python - 根据其他两列的字符串创建 Pandas 数据框列

上一篇：python - 字符串替换问题

下一篇：python - Pandas - 合并和比较两个 DataFrame(一个独特的列)