我有两个数据帧 df1 和 df2,如下所示:
#df1
counts freqs
categories
automatic 13 0.40625
manual 19 0.59375
#df2
counts freqs
categories
Straight Engine 18 0.5625
V engine 14 0.4375
谁能解释一下为什么pd.concat([df1, df2], axis = 1)
不会给我这个:
counts freqs
categories
automatic 13 0.40625
manual 19 0.59375
Straight Engine 18 0.5625
V engine 14 0.4375
<小时/>
这是我尝试过的:
1 - 使用pd.concat()
我怀疑我构建这些数据框的方式可能是问题的根源。 以下是我如何得到这些特定的数据框:
# imports
import pandas as pd
from pydataset import data # pip install pydataset to get datasets from R
# load data
df_mtcars = data('mtcars')
# change dummyvariables to more describing variables:
df_mtcars['am'][df_mtcars['am'] == 0] = 'manual'
df_mtcars['am'][df_mtcars['am'] == 1] = 'automatic'
df_mtcars['vs'][df_mtcars['vs'] == 0] = 'Straight Engine'
df_mtcars['vs'][df_mtcars['vs'] == 1] = 'V engine'
# describe categorical variables
df1 = pd.Categorical(df_mtcars['am']).describe()
df2 = pd.Categorical(df_mtcars['vs']).describe()
我知道“类别”是导致这里问题的原因,因为 df_con = pd.concat([df1, df2], axis = 1)
引发此错误:
TypeError: categories must match existing categories when appending
但让我困惑的是,这没关系:
# code
df_con = pd.concat([df1, df2], axis = 1)
# output:
counts freqs counts freqs
categories
automatic 13.0 0.40625 NaN NaN
manual 19.0 0.59375 NaN NaN
Straight Engine NaN NaN 18.0 0.5625
V engine NaN NaN 14.0 0.4375
2 - 使用df.append()
引发与 pd.concat()
相同的错误
3 - 使用pd.merge()
有点有效,但我失去了索引:
# Code
df_merge = pd.merge(df1, df2, how = 'outer')
# Output
counts freqs
0 13 0.40625
1 19 0.59375
2 18 0.56250
3 14 0.43750
3 - 使用pd.concat()
在转置数据帧上
自 pd.concat()
与 axis = 0
合作我想我会使用转置数据帧到达那里。
# df1.T
categories automatic manual
counts 13.00000 19.00000
freqs 0.40625 0.59375
# df2.T
categories Straight Engine V engine
counts 18.0000 14.0000
freqs 0.5625 0.4375
但仍然没有成功:
# code
df_con = pd.concat([df1.T, df2.T], axis = 1)
>>> TypeError: categories must match existing categories when appending
顺便说一句,我所希望的是:
categories automatic manual Straight Engine V engine
counts 13.00000 19.00000 18.0000 14.0000
freqs 0.40625 0.59375 0.5625 0.4375
仍然可以与 axis = 0
配合使用不过:
# code
df_con = pd.concat([df1.T, df2.T], axis = 0)
# Output
categories automatic manual Straight Engine V engine
counts 13.00000 19.00000 NaN NaN
freqs 0.40625 0.59375 NaN NaN
counts NaN NaN 18.0000 14.0000
freqs NaN NaN 0.5625 0.4375
但这距离我想要实现的目标还很远。
现在我认为可以从 df1 和 df2 中删除“类别”信息,但我还没有找到如何做到这一点。
感谢您的任何其他建议!
最佳答案
试试这个,
pd.concat([df1.reset_index(),df2.reset_index()],ignore_index=True)
输出:
categories counts freqs
0 automatic 13 0.40625
1 manual 19 0.59375
2 Straight Engine 18 0.56250
3 V engine 14 0.43750
要再次获取类别作为索引,请遵循此操作,
pd.concat([df1.reset_index(),df2.reset_index()],ignore_index=True).set_index('categories')
输出:
counts freqs
categories
automatic 13 0.40625
manual 19 0.59375
Straight Engine 18 0.56250
V engine 14 0.43750
更多详情请关注this docs
关于python - 尝试将数据帧与分类数据连接时出现意外错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50965581/