python - 如何从多列构建索引并设置为列 pandas 数据框？

我想学习如何将数据框列作为从多列映射的代码。

在下面的部分示例中，我正在尝试遵循路径的笨拙方法:获取唯一值作为临时数据框；将一些前缀字符串连接到临时行号作为新列，并将它们加入 2 个数据帧。

df = pd.DataFrame({'col1' : ['A1', 'A2', 'A1', 'A3'],
                   'col2' : ['B1', 'B2', 'B1', 'B1'],
                   'value' : [100, 200, 300, 400],
                   })

tmp = df[['col1','col2']].drop_duplicates(['col1', 'col2'])


#   col1 col2
# 0   A1   B1
# 1   A2   B2
# 3   A3   B1

第一个问题是如何获取'temp'行号及其值到tmp列？

从 df 获得以下结果的聪明的 pythonic 方法是什么？

dfnew = pd.DataFrame({'col1' : ['A1', 'A2', 'A1', 'A3'],
                   'col2' : ['B1', 'B2', 'B1', 'B1'],
                   'code' :  ['CODE0','CODE1', 'CODE0', 'CODE3'],
                   'value' : [100, 200, 300, 400],
                   })

    code col1 col2  value
0  CODE0   A1   B1    100
1  CODE1   A2   B2    200
2  CODE0   A1   B1    300
3  CODE3   A3   B1    400

谢谢。

在回答之后，作为练习，我继续研究我心中的非 pythonic 版本，并从很好的答案中获得了见解，并达到了这个目的:

tmp = df[['col1','col2']].drop_duplicates(['col1', 'col2'])

tmp.reset_index(inplace=True)

tmp.drop('index', axis=1, inplace=True)

tmp['code'] = tmp.index.to_series().apply(lambda x: 'code' + format(x, '04d'))

dfnew = pd.merge(df, tmp, on=['col1', 'col2'])

在发布这个问题时，我没有意识到将索引重置为具有新序列而不是原始索引编号会更好。

我尝试了一些变体，但我不知道如何在一个命令中链接“reset_index”和“drop”。

我开始喜欢 Python。谢谢大家。

最佳答案

df.index 上的

groupby 使用 ['col1', 'col2'] 使用 transform('first') 和 map

df.assign(
    code=df.index.to_series().groupby(
        [df.col1, df.col2]
    ).transform('first').map('CODE{}'.format)
)[['code'] + df.columns.tolist()]

    code col1 col2  value
0  CODE0   A1   B1    100
1  CODE1   A2   B2    200
2  CODE0   A1   B1    300
3  CODE3   A3   B1    400

解释

# turn index to series so I can perform a groupby on it
idx_series = df.index.to_series()

# groupby col1 and col2 to establish uniqueness
idx_gb = idx_series.groupby([df.col1, df.col2])

# get first index value in each unique group
# and broadcast over entire group with transform
idx_tf = idx_gb.transform('first')

# map a format function to get desired string
code = idx_tf.map('code{}'.format)

# use assign to create new column
df.assign(code=code)

关于python - 如何从多列构建索引并设置为列 pandas 数据框？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41519908/

python - 如何从多列构建索引并设置为列 pandas 数据框？

上一篇：python - 3SUM(查找列表中所有唯一的三元组等于 0)

下一篇：python - codecs.ascii_decode(输入，self.errors)[0] UnicodeDecodeError : 'ascii' codec can't decode byte 0xc2 in position 318: ordinal not in range(128)