python - 当给出预计数时，计算大型文本文件中字符串的频率(有效)

我有一个以下形式的列表列表:

[['about70-130 characters long string', '332'], ['someotherrandomstring','2'], ['about70-130 characters long string', 32], ['someotherrandomstring', '3333']]

待办事项: 我最终想将所有重复字符串的大小相加，如下所示:

[['about70-130 characters long string',364], ['someotherrandomstring',3335]]

我编写了一个强力代码来解决这个问题，但这花费了我很多时间，因为该列表有大约 200 万个列表。我写的非常低效的代码是:

final = {} 
for element in both_list:
    size = int(element[1])
    if element[0] not in final.keys():
       final[element[0]] = size
    else:
       final[element[0]] += size

我很确定有一个更省时的代码，但我似乎想不出任何想法。任何正确方向的帮助和指示将不胜感激。谢谢。

最佳答案

如果你可以使用第三方库pandas

import pandas as pd
a=[['about70-130 characters long string', '332'], 
    ['someotherrandomstring','2'],['about70-130 characters long string', 32],['someotherrandomstring', '3333']]
df=pd.DataFrame(a,columns=['label','counts'])
df.counts=df.counts.astype('int')
df.groupby('label')['counts'].sum().to_dict()

它可能比您的解决方案快一点

a=[['about70-130 characters long string', '332'], 
    ['someotherrandomstring','2'],['about70-130 characters long string', 32],['someotherrandomstring', '3333']]
d={}
for i in a:
    if i[0] not in d:
        d[i[0]]=d.get(i[0],int(i[1]))
    else:
        d[i[0]]=d.get(i[0])+int(i[1])

关于python - 当给出预计数时，计算大型文本文件中字符串的频率(有效)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52722980/

python - 当给出预计数时，计算大型文本文件中字符串的频率(有效)

上一篇：python - 在 PyOpenGL 中使用 FreeGLUT 中的 glutMouseWheelFunc？

下一篇：python - 我们可以在条件下进行分配吗？