我认为以下代码效率极低。有没有更好的方法在 pandas 中进行这种类型的通用重新编码?
df['F'] = 0
df['F'][(df['B'] >=3) & (df['C'] >=4.35)] = 1
df['F'][(df['B'] >=3) & (df['C'] < 4.35)] = 2
df['F'][(df['B'] < 3) & (df['C'] >=4.35)] = 3
df['F'][(df['B'] < 3) & (df['C'] < 4.35)] = 4
最佳答案
使用numpy.select
并将 bool 掩码缓存到变量以获得更好的性能:
m1 = df['B'] >= 3
m2 = df['C'] >= 4.35
m3 = df['C'] < 4.35
m4 = df['B'] < 3
df['F'] = np.select([m1 & m2, m1 & m3, m4 & m2, m4 & m3], [1,2,3,4], default=0)
关于python - Pandas 基于两个现有变量创建一个新变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50851137/