python - Panda 的 DataFrame - 重命名多个同名列

我在 df.txt 中有几个名称相同的列。我需要重命名它们，但问题是 df.rename 方法以相同的方式重命名它们。如何将以下 blah(s) 重命名为 blah1、blah4、blah5？

df = pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns = ['blah','blah2','blah3','blah','blah']
df

#     blah  blah2  blah3  blah  blah
# 0   0     1      2      3     4
# 1   5     6      7      8     9

以下是使用df.rename方法时发生的情况:

df.rename(columns={'blah':'blah1'})

#     blah1  blah2  blah3  blah1  blah1
# 0   0      1      2      3      4
# 1   5      6      7      8      9

最佳答案

Starting with Pandas 0.19.0 pd.read_csv() has improved support for duplicate column names

所以我们可以尝试使用内部方法:

In [137]: pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
Out[137]: ['blah', 'blah2', 'blah3', 'blah.1', 'blah.2']

自 Pandas 1.3.0 起:

pd.io.parsers.base_parser.ParserBase({'names':df.columns, 'usecols':None})._maybe_dedup_names(df.columns)

这是“神奇”的功能:

def _maybe_dedup_names(self, names):
    # see gh-7160 and gh-9424: this helps to provide
    # immediate alleviation of the duplicate names
    # issue and appears to be satisfactory to users,
    # but ultimately, not needing to butcher the names
    # would be nice!
    if self.mangle_dupe_cols:
        names = list(names)  # so we can index
        counts = {}

        for i, col in enumerate(names):
            cur_count = counts.get(col, 0)

            if cur_count > 0:
                names[i] = '%s.%d' % (col, cur_count)

            counts[col] = cur_count + 1

    return names

关于python - Panda 的 DataFrame - 重命名多个同名列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54337057/

python - Panda 的 DataFrame - 重命名多个同名列

上一篇：python - 从 csv 数据框中选择一列

下一篇：python - 使用 csv.DictWriter 水平写入列表值