我在 df.txt 中有几个名称相同的列。我需要重命名它们,但问题是 df.rename 方法以相同的方式重命名它们。如何将以下 blah(s) 重命名为 blah1、blah4、blah5?
df = pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns = ['blah','blah2','blah3','blah','blah']
df
# blah blah2 blah3 blah blah
# 0 0 1 2 3 4
# 1 5 6 7 8 9
以下是使用df.rename
方法时发生的情况:
df.rename(columns={'blah':'blah1'})
# blah1 blah2 blah3 blah1 blah1
# 0 0 1 2 3 4
# 1 5 6 7 8 9
最佳答案
Starting with Pandas 0.19.0 pd.read_csv()
has improved support for duplicate column names
所以我们可以尝试使用内部方法:
In [137]: pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
Out[137]: ['blah', 'blah2', 'blah3', 'blah.1', 'blah.2']
自 Pandas 1.3.0 起:
pd.io.parsers.base_parser.ParserBase({'names':df.columns, 'usecols':None})._maybe_dedup_names(df.columns)
这是“神奇”的功能:
def _maybe_dedup_names(self, names):
# see gh-7160 and gh-9424: this helps to provide
# immediate alleviation of the duplicate names
# issue and appears to be satisfactory to users,
# but ultimately, not needing to butcher the names
# would be nice!
if self.mangle_dupe_cols:
names = list(names) # so we can index
counts = {}
for i, col in enumerate(names):
cur_count = counts.get(col, 0)
if cur_count > 0:
names[i] = '%s.%d' % (col, cur_count)
counts[col] = cur_count + 1
return names
关于python - Panda 的 DataFrame - 重命名多个同名列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54337057/