python - 同时连接 Pandas 数据框并排/顶部和底部行

标签 python pandas indexing dataframe concatenation

我有问题。我想从另一个数据帧创建一个新的数据帧。我想避免重复的行。这意味着如果有相同的邮件,我应该将它们并排连接,否则顶部和底部。但问题是我每次都会遇到值(value)索引错误。

pandas.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

这就是我所做的:

if not self.data.empty:
    if data_frame_['Email'][0] in self.data['Email'].get_values():
        self.data = pd.concat([self.data, data_frame_], axis=1)
    else:
        self.data = pd.concat([self.data,data_frame_], axis=0)  
else: 
    self.data = data_frame_.copy()

end = time.time()

data_frame_ 只有一行,这就是我使用的原因

data_frame_['Email'][0]

数据示例(位于 data_frame_ 中):

 Email                     Project1 Target1 Projetc2 Target2
-------------------------------------------------------------
 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1e7573725e737f7772307d7173" rel="noreferrer noopener nofollow">[email protected]</a>                1      5000     NaN       NaN
 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f2939091b2939091dc919d9f" rel="noreferrer noopener nofollow">[email protected]</a>                 7      5000     NaN       NaN
 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bfd4d2d3ffd2ded6d391dcd0d2" rel="noreferrer noopener nofollow">[email protected]</a>                7      4000     NaN       NaN

我想要的是:

 Email                     Project1 Target1 Projetc2 Target2
-------------------------------------------------------------
 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="066d6b6a466b676f6a2865696b" rel="noreferrer noopener nofollow">[email protected]</a>               1       5000      7       4000
 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ee8f8c8dae8f8c8dc08d8183" rel="noreferrer noopener nofollow">[email protected]</a>                7       5000     NaN       NaN

Ps:我可以使用字典来做到这一点,但为了保护代码完整性,我想使用数据帧。

提前谢谢您。

最佳答案

您可以使用pivot_table ,但首先按 cumcount 创建组:

#rename columns
df.rename(columns={'Project1':'Project','Target1':'Target'}, inplace=True)

print (df)
      Email                Project  Target
0  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b3d8dedff3ded2dadf9dd0dcde" rel="noreferrer noopener nofollow">[email protected]</a>              1    5000
1  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="fe9f9c9dbe9f9c9dd09d9193" rel="noreferrer noopener nofollow">[email protected]</a>               7    5000
2  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2a4147466a474b434604494547" rel="noreferrer noopener nofollow">[email protected]</a>              7    4000

df['g'] = (df.groupby('Email').cumcount() + 1).astype(str)

df1 = df.pivot_table(index='Email', columns='g', values=['Project', 'Target'])
#Sort multiindex in columns 
df1 = df1.sort_index(axis=1, level=1)
#'reset' multiindex in columns
df1.columns = [''.join(col) for col in df1.columns]
print (df1)
                     Project1  Target1  Project2  Target2
Email                                                    
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="65040706250407064b060a08" rel="noreferrer noopener nofollow">[email protected]</a>               7.0   5000.0       NaN      NaN
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ddb6b0b19db0bcb4b1f3beb2b0" rel="noreferrer noopener nofollow">[email protected]</a>              1.0   5000.0       7.0   4000.0

关于python - 同时连接 Pandas 数据框并排/顶部和底部行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38199803/

相关文章:

python - 使用代理找出哪个响应属于哪个请求

python - 如何分隔 Pandas 中的顺序组?

python - 如何对齐重叠图中的直方图 bin 边缘

MySQL 更新唯一约束

python - 来自 numpy 数组中的最大值的掩码,特定轴

python - 从 url 读取图像的最快方法是什么?

python - 根据多列条件过滤 DataFrame

python - 如何将字典附加到 Pandas 数据框?

python - Python 中 Pandas 的快速取子集

postgresql - postgres - 估计时间戳列的索引大小