我有两个 pandas 数据框。第一个包含 3401 行 1 列,第二个包含 4 行 3 列。
但是我得到的是(我的脚本的输出示例):
DataFrame1 | DataFrame2
<小时/>
- email1 | -Id1 -Project1 -Descr1
- email2 | -Id2 -Project2 -Descr2
- email3 | -Id3 -Project3 -Descr3
- email4 | -Id4 -Project4 -Descr4
- email5 | -None -None -None
... .... | ... ...
- email3401 | -None -None -None
我想做的是对于每封邮件,我都想得到这样的东西:
- mail1, Id1, Project1, Descr1, Id2, Project2, ... , Id4, Project4, Descr4
- mail2, Id1, Project1, Descr1, Id2, Project2, ... , Id4, Project4, Descr4
... ...
- mail3401, Id1, Project1, Descr1, Id2, Project2, ... , Id4, Project4, Descr4
感谢您的建议!
这是我的代码:
path = r"/Users/kd/path"
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
file_names = []
j=0
for file_ in allFiles:
name = os.path.splitext(file_)[0]
i = int(name[-1])
file_names.append(name)
df = pd.read_csv(file_, index_col = None, header = 0)
if j>0:
globals()["self.dfInternautes%s"%i] = pd.concat([globals(["self.dfInternautes%s"%i], df], axis=1)
else:
globals()["self.dfInternautes%s"%i] = df
j += 1
最佳答案
要从 DataFrame 中生成一行,请使用 stack
。然后对其进行迭代,在第一个 DataFrame 中创建新列。
>>> df1
0
0 email1
1 email2
2 email3
3 email4
4 email5
5 email6
>>> df2
0 1 2
0 Id1 Project1 Descr1
1 Id2 Project2 Descr2
2 Id3 Project3 Descr3
3 Id4 Project4 Descr4
>>> st = df2.stack()
>>> st
0 0 Id1
1 Project1
2 Descr1
1 0 Id2
1 Project2
2 Descr2
2 0 Id3
1 Project3
2 Descr3
3 0 Id4
1 Project4
2 Descr4
dtype: object
>>> df = df1.copy()
>>> for i in st.index: df[i] = st[i]
...
>>> df
0 (0, 0) (0, 1) (0, 2) (1, 0) (1, 1) (1, 2) (2, 0) (2, 1) \
0 email1 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
1 email2 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
2 email3 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
3 email4 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
4 email5 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
5 email6 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
(2, 2) (3, 0) (3, 1) (3, 2)
0 Descr3 Id4 Project4 Descr4
1 Descr3 Id4 Project4 Descr4
2 Descr3 Id4 Project4 Descr4
3 Descr3 Id4 Project4 Descr4
4 Descr3 Id4 Project4 Descr4
5 Descr3 Id4 Project4 Descr4
可以选择更改列名称
df.columns = ['email', 'Id1', 'Project1', 'Descr1', 'Id2', 'Project2', 'Descr2', 'Id3', 'Project3', 'Descr3', 'Id4', 'Project4', 'Descr4']
关于python - 在 Python 中合并 Pandas DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37855450/