Pandas 的多索引语法并不容易发现。 就我而言,给定这个数据集:
header = pd.MultiIndex.from_product([['topic1'],
['location1','location2'],
['S1','S2','S3']],
names=['top', 'loc','S'])
df = pd.DataFrame(np.random.randn(5, 6),
index=['a','b','c','d','e'],
columns=header)
df
产生:
top topic1
loc location1 location2
S S1 S2 S3 S1 S2 S3
a -0.235613 1.064278 -2.147621 0.825380 -0.443313 -1.064031
b 0.404703 0.830838 -0.294387 -1.438028 0.836324 -2.427235
c 0.486648 -0.091448 1.246530 -0.005375 0.159478 -0.103404
d -0.638070 -1.057061 0.596882 -1.007059 -0.654583 -0.618137
e -0.850887 -1.660056 0.129954 1.204890 -1.457207 0.678393
我愿意:
- 将“top”和“loc”重命名为“aaa”、“bbb”
- 克隆整个主题 1 以创建一个新的多列“topic2”,其中值例如为 x2
最佳答案
您可以使用concat
使用参数keys
定义MultiIndex
的顶层并更改列的name
使用rename_axis
或赋值:
df = (pd.concat([df['topic1'], df['topic1'] * 2], keys=('topic1','topic2'), axis=1)
.rename_axis(('aaa','bbb', df.columns.names[2]), axis=1))
替代方案:
df = pd.concat([df['topic1'], df['topic1'] * 2], keys=('topic1','topic2'), axis=1)
df.columns.names = ('aaa','bbb', df.columns.names[2])
print (df)
aaa topic1 topic2 \
bbb location1 location2 location1
S S1 S2 S3 S1 S2 S3 S1
a 0.511604 -0.217660 -0.521060 1.253270 1.104554 -0.770309 1.023207
b 0.632975 -1.322322 -0.936332 0.436361 1.233744 0.527565 1.265951
c -0.369576 1.820059 -1.373630 -0.414554 -0.098443 0.904791 -0.739151
d 1.656726 -0.972017 -0.300689 -0.179819 0.472515 2.379975 3.313453
e -0.053210 -0.180697 0.176240 -1.087404 -1.012181 -0.049870 -0.106421
aaa
bbb location2
S S2 S3 S1 S2 S3
a -0.435320 -1.042120 2.506541 2.209108 -1.540617
b -2.644644 -1.872664 0.872723 2.467488 1.055129
c 3.640118 -2.747261 -0.829108 -0.196885 1.809582
d -1.944034 -0.601379 -0.359638 0.945030 4.759950
e -0.361395 0.352480 -2.174809 -2.024362 -0.099739
关于python - 更改名称并创建新的多列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49598289/