python - 如何合并两个 pandas DataFrame 并聚合一个特定的列

标签 python pandas

我有 2 个数据框:

         city  count    school
0    New York      1  school_3
1  Washington      1  School_4
2  Washington      1  School_5
3          LA      1  School_1
4          LA      1  School_4

         city  count    school
0    New York      1  School_3
1  Washington      1  School_1
2          LA      1  School_3
3          LA      2  School_4

我想得到这个结果:

         city  count    school
0    New York      2  school_3
1  Washington      1  School_1
2  Washington      1  School_4
3  Washington      1  School_5
4          LA      1  School_1
5          LA      1  School_3
6          LA      3  School_4

代码如下。

d1 = [{'city':'New York', 'school':'school_3', 'count':1},
      {'city':'Washington', 'school':'School_4', 'count':1},
      {'city':'Washington', 'school':'School_5', 'count':1},
      {'city':'LA', 'school':'School_1', 'count':1},
      {'city':'LA', 'school':'School_4', 'count':1}]


d2 = [{'city':'New York', 'school':'School_3', 'count':1},
      {'city':'Washington', 'school':'School_1', 'count':1},
      {'city':'LA', 'school':'School_3', 'count':1},
      {'city':'LA', 'school':'School_4', 'count':2}]

x1 = pd.DataFrame(d1)
x2 = pd.DataFrame(d2)
#just get empty DataFrame
print pd.merge(x1, x2)

如何得到聚合结果?

最佳答案

你可以这样做:

>>> pd.concat([x1, x2]).groupby(["city", "school"], as_index=False)["count"].sum()
       city    school        count
0          LA  School_1      1
1          LA  School_3      1
2          LA  School_4      3
3    New York  School_3      1
4    New York  school_3      1
5  Washington  School_1      1
6  Washington  School_4      1
7  Washington  School_5      1

请注意,由于数据中的拼写错误,纽约出现了 2 次(school_3 vs School_3)。

关于python - 如何合并两个 pandas DataFrame 并聚合一个特定的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28143694/

相关文章:

带有 sys.argv[] 的 Python 3.x 调用函数

python - Python 中向文档字符串添加元信息或从文档字符串检索元信息的规范方式?

python - Python中删除多列中的非中文元素

python - 属性错误 : 'module' object has no attribute 'request'

python-eve 生成​​ _etag、_updated 和 _created

python - 随机不在 Nose 测试中使用模拟

python - 如何根据条件将 csv 文件合并到单个文件并将文件名添加为列?

python - 从 Pandas Dataframe 读取值时发生内存泄漏

python - 在Python中将一列拆分为多列

python - 在 Pandas 中按列名选择两组列