python - 如何使用 Pandas 计算另一列中每个值在一列中的出现次数？

我有一个带有唯一索引和“用户”、“tweet_time”和“tweet_id”列的数据框。

我想计算每个用户重复的 tweet_time 值的数量。

users = ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C']
tweet_times = ['01-01-01 01:00', '02-02-02 02:00', '03-03-03 03:00', '09-09-09 09:00',
               '04-04-04 04:00', '04-04-04 04:00', '05-05-05 05:00', '09-09-09 09:00',
               '06-06-06 06:00', '06-06-06 06:00', '07-07-07 07:00', '07-07-07 07:00']

d = {'users': users, 'tweet_times': tweet_times} 
df = pd.DataFrame(data=d)

期望的输出

一个:0

乙:1

C: 2

我设法使用下面的代码获得所需的输出(A: 0 除外)。但是是否有更 pythonic/更有效的方法来做到这一点？

# group by both columns
df2 = pd.DataFrame(df.groupby(['users', 'tweet_times']).tweet_id.count())

# filter out values < 2
df3 = df2[df2.tweet_id > 1]

# turn multi-index level 1 into column
df3.reset_index(level=[1], inplace=True)

# final groupby
df3.groupby('users').tweet_times.count()

最佳答案

我们可以使用crosstab创建一个频率表，然后检查大于 1 的计数，创建一个 bool 掩码，然后沿着 axis=1

sum 这个掩码

pd.crosstab(df['users'], df['tweet_times']).gt(1).sum(1)

 users
A    0
B    1
C    2
dtype: int64

关于python - 如何使用 Pandas 计算另一列中每个值在一列中的出现次数？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67489092/

上一篇：r - 汇总必须分组的多个列 tidyverse

下一篇：c - 为什么使用严格一致的程序和一致的实现并不能保证绝对的可移植性？

相关文章：

python - 在 lxml 中，如何删除标签但保留所有内容？

python - call_command 参数是必需的

python - 在 pandas/python 上绘制带有 Z 分数的概率密度函数

python - 如何在 Pandas 中使用groupby根据另一列中的标准计算百分比/比例总数

mysql where + group by 很慢

python - 从大字典中就地删除元素

python - pip3 安装无法正常工作并且无法链接到 python3

python - 通过与其他 Dataframe 列映射来更改 Dataframe 列名称，Python 3.6

python - 将 (n_samples, n_features) ndarray 转换为 (n_samples, 1) 向量数组，用作 sklearn SVM 的训练标签

json - Mongolite 分组依据/聚合 JSON 对象