python - 根据其他两列的字符串创建 Pandas 数据框列

标签 python python-3.x pandas numpy dataframe

我有一个如下所示的数据框:

boat_type   boat_type_2
Not Known   Not Known
Not Known   kayak
ship        Not Known
Not Known   Not Known
ship        Not Known

我想创建第三列 boat_type_final 应该如下所示:

boat_type   boat_type_2  boat_type_final
Not Known   Not Known    cruise
Not Known   kayak        kayak
ship        Not Known    ship  
Not Known   Not Known    cruise
ship        Not Known    ship

所以基本上,如果 boat_typeboat_type_2 中都存在“Not Known”,那么值应该是“cruise”。但是,如果前两列中有除“Not Known”以外的字符串,则 boat_type_final 应填充该字符串,“kayak”或“ship”。

执行此操作最优雅的方法是什么?我见过几个选项,例如 where、创建函数和/或逻辑,我想知道真正的 pythonista 会做什么。

到目前为止,这是我的代码:

import pandas as pd
import numpy as np
data = [{'boat_type': 'Not Known', 'boat_type_2': 'Not Known'},
    {'boat_type': 'Not Known',  'boat_type_2': 'kayak'},
    {'boat_type': 'ship',  'boat_type_2': 'Not Known'},
    {'boat_type': 'Not Known',  'boat_type_2': 'Not Known'},
    {'boat_type': 'ship',  'boat_type_2': 'Not Known'}]
df = pd.DataFrame(data
df['phone_type_final'] = np.where(df.phone_type.str.contains('Not'))...

最佳答案

使用:

df['boat_type_final'] = (df.replace('Not Known',np.nan)
                           .ffill(axis=1)
                           .iloc[:, -1]
                           .fillna('cruise'))
print (df)
   boat_type boat_type_2 boat_type_final
0  Not Known   Not Known          cruise
1  Not Known       kayak           kayak
2       ship   Not Known            ship
3  Not Known   Not Known          cruise
4       ship   Not Known            ship

解释:

第一个replace Not Known 缺失值:

print (df.replace('Not Known',np.nan))
  boat_type boat_type_2
0       NaN         NaN
1       NaN       kayak
2      ship         NaN
3       NaN         NaN
4      ship         NaN

然后通过每行前向填充替换NaN:

print (df.replace('Not Known',np.nan).ffill(axis=1))
  boat_type boat_type_2
0       NaN         NaN
1       NaN       kayak
2      ship        ship
3       NaN         NaN
4      ship        ship

按位置选择最后一列 iloc :

print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1])
0      NaN
1    kayak
2     ship
3      NaN
4     ship
Name: boat_type_2, dtype: object

如果可能,NaN 添加 fillna :

print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1].fillna('cruise'))
0    cruise
1     kayak
2      ship
3    cruise
4      ship
Name: boat_type_2, dtype: object

如果只有几列,另一种解决方案是使用 numpy.select :

m1 = df['boat_type'] == 'ship'
m2 = df['boat_type_2'] == 'kayak'

df['boat_type_final'] = np.select([m1, m2], ['ship','kayak'], default='cruise')
print (df)
   boat_type boat_type_2 boat_type_final
0  Not Known   Not Known          cruise
1  Not Known       kayak           kayak
2       ship   Not Known            ship
3  Not Known   Not Known          cruise
4       ship   Not Known            ship

关于python - 根据其他两列的字符串创建 Pandas 数据框列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51515724/

相关文章:

python - Pandas:计算列中一些附近的元素

Python:查找相同列表出现的次数并取平均值

python - 当链接没有类时,使用 Xpath 在 Python 中获取链接的 anchor 文本

python - Python中字符串与正则表达式的对应关系

python-3.x - 如何在python 3中收听传入的电子邮件?

python-3.x - 使用 python 从 netcdf 绘制风向量

python - 独立于模型名称设置种类名称(App Engine 数据存储区)

Python 3、以太坊——如何发送 ERC20 代币?

python - 如何使用变量名称列表从 .xls 文件夹自动创建 Pandas 数据框?

python - 拆分两行组内给定列的差异