我有一个数据框,其中有一列包含字典。我想计算整个列中字典键的出现次数。
一种方法如下:
import pandas as pd
from collections import Counter
df = pd.DataFrame({"data": [{"weight": 3, "color": "blue"},
{"size": 5, "weight": 2},{"size": 3, "color": "red"}]})
c = Counter()
for index, row in df.iterrows():
for item in list(row["data"].keys()):
c[item] += 1
print(c)
哪个给
Counter({'weight': 2, 'color': 2, 'size': 2})
有没有更快的方法?
最佳答案
一种更快的方法是用 itertools.chain
展平该列并根据结果构建一个 Counter
(它将只包含字典键):
from itertools import chain
Counter(chain.from_iterable(df.data.values.tolist()))
# Counter({'weight': 2, 'color': 2, 'size': 2})
时间:
def OP(df):
c = Counter()
for index, row in df.iterrows():
for item in list(row["data"].keys()):
c[item] += 1
%timeit OP(df)
# 570 µs ± 49.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit Counter(chain.from_iterable(df.data.values.tolist()))
# 14.2 µs ± 902 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
关于python - 计算字典 pandas 列中的项目,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57660403/