我想根据游戏名称和年份为此创建一个唯一的 ID。主要关注点是 col 名称。
我有多个文件:
Name Year Level
Pikachu 2007 30
Pikachu 2007 20
Raichu 2007 20
Mew 2007 35
<小时/>
Name Year Level
Pikachu 2008 50
Pikachu 2008 40
Raichu 2008 55
Mew 2008 55
<小时/>
<小时/>
Pokemon Year Level
Squirtle 2008 50
Pidgey 2008 40
Pidgey 2008 55
Ekans 2008 55
<小时/>
这是我想要的结果:
Game Name Year Level Id
Pokemon Pikachu 2007 30 1
Pokemon Pikachu 2007 20 1
Pokemon Raichu 2007 20 2
Pokemon Mew 2007 35 3
Pokemon Pikachu 2008 50 1
Pokemon Pikachu 2008 40 1
Pokemon Raichu 2008 55 2
Pokemon Mewtwo 2008 55 3
Pokemon Squirtle 2008 60 1
Pokemon Pidgey 2008 45 2
Pokemon Pidgey 2008 52 2
Pokemon Ekans 2008 51 3
我尝试过这个:
for file in files:
df = pd.read_csv(file,header=0)
df['Game'] = 'Pokemon'
for i, p in enumerate(df['Pokemon'].unique(), 1):
df.loc[i-1,'id'] = i
df.loc[i-1, 'Pokemon'] = p
df['Id'] = df['Id'].astype('int')
最佳答案
我想你想要factorize
每个DataFrame
,对于最终的大DataFrame
创建列表并最后通过 concat
连接在一起:
out = []
for file in files:
df = pd.read_csv(file,header=0)
df['Game'] = 'Pokemon'
df['id'] = pd.factorize(df['Name'])[0] + 1
out.append(df)
df = pd.concat(out, ignore_index=True)
print (df)
Name Year Level Game id
0 Pikachu 2007 30 Pokemon 1
1 Pikachu 2007 20 Pokemon 1
2 Raichu 2007 20 Pokemon 2
3 Mew 2007 35 Pokemon 3
4 Pikachu 2008 50 Pokemon 1
5 Pikachu 2008 40 Pokemon 1
6 Raichu 2008 55 Pokemon 2
7 Mew 2008 55 Pokemon 3
8 Squirtle 2008 50 Pokemon 1
9 Pidgey 2008 40 Pokemon 2
10 Pidgey 2008 55 Pokemon 2
11 Ekans 2008 55 Pokemon 3
关于python - 创建唯一的Id,读取多个文件时枚举不同的行值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58501772/