我有这个数据框:
source target
0 ape dog
1 ape hous
2 dog hous
3 hors dog
4 hors ape
5 dog ape
6 ape bird
7 ape hous
8 bird hous
9 bird fist
10 bird ape
11 fist ape
我正在尝试使用此代码生成频率计数:
df_count =df.groupby(['source', 'target']).size().reset_index().sort_values(0, ascending=False)
df_count.columns = ['source', 'target', 'weight']
我得到以下结果。
source target weight
2 ape hous 2
0 ape bird 1
1 ape dog 1
3 bird ape 1
4 bird fist 1
5 bird hous 1
6 dog ape 1
7 dog hous 1
8 fist ape 1
9 hors ape 1
10 hors dog 1
我如何修改代码,使方向无关紧要,即不是 ape bird 1
和 bird ape 1
,我得到 ape bird 2
?
最佳答案
首先按行对值进行排序。
In [31]: df
Out[31]:
source target
0 ape dog
1 ape hous
2 dog hous
3 hors dog
4 hors ape
5 dog ape
6 ape bird
7 ape hous
8 bird hous
9 bird fist
10 bird ape
11 fist ape
In [32]: df.values.sort()
In [33]: df
Out[33]:
source target
0 ape dog
1 ape hous
2 dog hous
3 dog hors
4 ape hors
5 ape dog
6 ape bird
7 ape hous
8 bird hous
9 bird fist
10 ape bird
11 ape fist
然后,groupby
对source, target
,按大小聚合,对结果进行排序
。
In [34]: df.groupby(['source', 'target']).size().sort_values(ascending=False)
...: .reset_index(name='weight')
Out[34]:
source target weight
0 ape hous 2
1 ape dog 2
2 ape bird 2
3 dog hous 1
4 dog hors 1
5 bird hous 1
6 bird fist 1
7 ape hors 1
8 ape fist 1
关于python - Pandas 数据帧频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41084113/