我有问题。我想从另一个数据帧创建一个新的数据帧。我想避免重复的行。这意味着如果有相同的邮件,我应该将它们并排连接,否则顶部和底部。但问题是我每次都会遇到值(value)索引错误。
pandas.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
这就是我所做的:
if not self.data.empty:
if data_frame_['Email'][0] in self.data['Email'].get_values():
self.data = pd.concat([self.data, data_frame_], axis=1)
else:
self.data = pd.concat([self.data,data_frame_], axis=0)
else:
self.data = data_frame_.copy()
end = time.time()
data_frame_ 只有一行,这就是我使用的原因
data_frame_['Email'][0]
数据示例(位于 data_frame_ 中):
Email Project1 Target1 Projetc2 Target2
-------------------------------------------------------------
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1e7573725e737f7772307d7173" rel="noreferrer noopener nofollow">[email protected]</a> 1 5000 NaN NaN
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f2939091b2939091dc919d9f" rel="noreferrer noopener nofollow">[email protected]</a> 7 5000 NaN NaN
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bfd4d2d3ffd2ded6d391dcd0d2" rel="noreferrer noopener nofollow">[email protected]</a> 7 4000 NaN NaN
我想要的是:
Email Project1 Target1 Projetc2 Target2
-------------------------------------------------------------
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="066d6b6a466b676f6a2865696b" rel="noreferrer noopener nofollow">[email protected]</a> 1 5000 7 4000
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ee8f8c8dae8f8c8dc08d8183" rel="noreferrer noopener nofollow">[email protected]</a> 7 5000 NaN NaN
Ps:我可以使用字典来做到这一点,但为了保护代码完整性,我想使用数据帧。
提前谢谢您。
最佳答案
您可以使用pivot_table
,但首先按 cumcount
创建组:
#rename columns
df.rename(columns={'Project1':'Project','Target1':'Target'}, inplace=True)
print (df)
Email Project Target
0 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b3d8dedff3ded2dadf9dd0dcde" rel="noreferrer noopener nofollow">[email protected]</a> 1 5000
1 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="fe9f9c9dbe9f9c9dd09d9193" rel="noreferrer noopener nofollow">[email protected]</a> 7 5000
2 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2a4147466a474b434604494547" rel="noreferrer noopener nofollow">[email protected]</a> 7 4000
df['g'] = (df.groupby('Email').cumcount() + 1).astype(str)
df1 = df.pivot_table(index='Email', columns='g', values=['Project', 'Target'])
#Sort multiindex in columns
df1 = df1.sort_index(axis=1, level=1)
#'reset' multiindex in columns
df1.columns = [''.join(col) for col in df1.columns]
print (df1)
Project1 Target1 Project2 Target2
Email
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="65040706250407064b060a08" rel="noreferrer noopener nofollow">[email protected]</a> 7.0 5000.0 NaN NaN
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ddb6b0b19db0bcb4b1f3beb2b0" rel="noreferrer noopener nofollow">[email protected]</a> 1.0 5000.0 7.0 4000.0
关于python - 同时连接 Pandas 数据框并排/顶部和底部行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38199803/