我想知道如何获取 pandas 数据框项目的频率计数,如下面的问题:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [1,1,2,3,5,2],
'B': [10,10,10,300,400,500],
'C': ['p','p','q','q','q','q']})
print(df)
A B C
0 1 10 p
1 1 10 p
2 2 10 q
3 3 300 q
4 5 400 q
5 2 500 q
要求的输出
A B C
(1,2) (10,3) ('p', 2)
(2,2) (300,1) ('q', 4)
(3,1) (400,1)
(5,1) (500,1)
最佳答案
您可以构造一个 Counter
的列表每列的对象,并重建数据框:
from collections import Counter
c = [Counter(j for j in i).items() for i in df.values.T]
pd.DataFrame.from_records(c, index=df.columns).T
A B C
0 (1, 2) (10, 3) (p, 2)
1 (2, 2) (300, 1) (q, 4)
2 (3, 1) (400, 1) None
3 (5, 1) (500, 1) None
为了排序计数:
from operator import itemgetter
c = [sorted(
Counter(j for j in i).items(),
key=itemgetter(1),
reverse=True)
for i in df.values.T]
pd.DataFrame.from_records(c, index=df.columns).T
A B C
0 (1, 2) (10, 3) (q, 4)
1 (2, 2) (300, 1) (p, 2)
2 (3, 1) (400, 1) None
3 (5, 1) (500, 1) None
关于python - 如何获取 Pandas 列的频率计数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58137537/