我在 pandas 中有一个数据框:
import pandas as pd
# assign data of lists.
data = {'Gender': ['M', 'F', 'M', 'F','M', 'F','M', 'F','M', 'F','M', 'F'],
'Employment': ['R','U', 'E','R','U', 'E','R','U', 'E','R','U', 'E'],
'Age': ['Y','M', 'O','Y','M', 'O','Y','M', 'O','Y','M', 'O']
}
# Create DataFrame
df = pd.DataFrame(data)
df
我想要的是为每个现有列的每个类别创建一个具有以下格式的新列:
Gender_M -> for when the gender equals M
Gender_F -> for when the gender equal F
Employment_R -> for when employment equals R
Employment_U -> for when employment equals U
and so on...
到目前为止,我已经创建了以下代码:
for i in range(len(df.columns)):
curent_column=list(df.columns)[i]
col_df_array = df[curent_column].unique()
for j in range(col_df_array.size):
new_col_name = str(list(df.columns)[i])+"_"+col_df_array[j]
for index,row in df.iterrows():
if(row[curent_column] == col_df_array[j]):
df[new_col_name] = row[curent_column]
问题是,尽管我已成功创建列名称,但我无法获取正确的列值。
例如,性别列应如下所示:
data2 = {'Gender': ['M', 'F', 'M', 'F','M', 'F','M', 'F','M', 'F','M', 'F'],
'Gender_M': ['M', 'na', 'M', 'na','M', 'na','M', 'na','M', 'na','M', 'na'],
'Gender_F': ['na', 'F', 'na', 'F','na', 'F','na', 'F','na', 'F','na', 'F']
}
df2 = pd.DataFrame(data2)
只是说,na
可以是任何内容,例如空格、点或 NAN。
最佳答案
您正在寻找pd.get_dummies
.
>>> pd.get_dummies(df)
Gender_F Gender_M Employment_E Employment_R Employment_U Age_M Age_O Age_Y
0 0 1 0 1 0 0 0 1
1 1 0 0 0 1 1 0 0
2 0 1 1 0 0 0 1 0
3 1 0 0 1 0 0 0 1
4 0 1 0 0 1 1 0 0
5 1 0 1 0 0 0 1 0
6 0 1 0 1 0 0 0 1
7 1 0 0 0 1 1 0 0
8 0 1 1 0 0 0 1 0
9 1 0 0 1 0 0 0 1
10 0 1 0 0 1 1 0 0
11 1 0 1 0 0 0 1 0
关于pandas - 根据现有的列名和列值在 python 数据框中创建列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70406840/