我有一个以下形式的数据框:
df = [["john","2019","30.2"] , ["john","2019","40"] , ["john","2020","50.3"] ,
["amy","2019","60"] , ["amy","2019","20"] , ["amy","2020","40.1"]]
我想要的结果是最后一个索引的多条件求和列表,而前两个索引相等:
> [["john", "2019", "70.2"] , ["john","2020","50.3"] , ["amy","2019","80"] , ["amy","2020","40.1"]]
我尝试做的是一个 for 循环,检查每个条件的相等性,然后在条件成立时对最后一个索引求和 - 这是某种伪代码:
for i in df[i]:
if df[i][0] == df[i+1][0] and df[i][1] == df[i+1][1]: #if both conditions are true
sum1 = sum(float(df[i][2]))
lst = []
lst.append(df[i][0])
lst.append(df[i][1])
lst.append(str(sum1))
编辑:希望有一个不使用包的解决方案。
最佳答案
以下代码不使用任何包。从 Python 3.7
开始,所有字典都是 insertion-ordered ,这个事实在下面的代码中使用,以便最终结果具有元素原始外观的顺序。如果由于某种原因您的 python 低于 3.7
,请告诉我,我将修改代码以显式进行排序,而不是依赖此语言功能。
df = [["john","2019","30.2"], ["john","2019","40"], ["john","2020","50.3"],
["amy","2019","60"], ["amy","2019","20"], ["amy","2020","40.1"]]
r = {}
for *a, b in df:
a = tuple(a)
if a not in r:
r[a] = 0
r[a] += float(b)
r = [list(k) + [str(v)] for k, v in r.items()]
print(r)
输出:
[['john', '2019', '70.2'], ['john', '2020', '50.3'], ['amy', '2019', '80.0'], ['amy', '2020', '40.1']]
关于python - 进行多条件求和的循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70402535/