python - 使用 python 和 pandas 按数据框分组

标签 python pandas dataframe numpy keyword

假设我有这样的 df

<表类="s-表"> <头> 身份证 name_x st 字符串 <正文> 1 xx 我们不认识浣熊酋长影响了他的晋升前景 2 xy 我们1 立交桥下了高速公路，进入了一个 secret 的世界 3 xz 我们他 100% 和她一起禁食，直到他明白那意味着他不能进食 4 许我们2 在其他随机词前面的随机词创建一个随机句子 5 习我们1 拿起笔开始

对 st 列使用 python 和 pandas 我想计算 name_x 值，然后从字符串中提取前 3 个关键词。

例如像这样:

<表类="s-表"> <头> st name_x_count top1_word top2_word top3_word <正文> 我们 2 单词1 词2 单词3 我们1 2 单词1 词2 单词3 我们2 1 单词1 词2 单词3

有什么办法可以解决这个任务吗？

最佳答案

我会首先使用 groupby() 来连接您显示的字符串，然后使用集合 Counter，然后使用 most_common。最后将其分配回数据框。我正在使用 x.lower()，否则“他”和“他”将被视为不同的词(但如果有意，您可以随时将其删除):

output = df.groupby('st').agg(
    name_x_count = pd.NamedAgg('name_x','count'),
    string = pd.NamedAgg('string',' '.join))

分组后，我们使用 collections.Counter() 创建列:

output[['top1_word','top2_word','top3_word']] = output['string'].map(lambda x: [x[0] for x in collections.Counter(x.lower().split()).most_common(3)])
output = output.drop(columns='string')

输出:

     name_x_count top1_word top2_word top3_word
st                                             
us              2        he      with       was
us1             2       the       and  overpass
us2             1    random     words        in

关于python - 使用 python 和 pandas 按数据框分组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/74385812/

上一篇：跟踪缓存变量/函数依赖关系的 Pythonic 方法

下一篇：c++ - 优化期间对有符号和无符号整数进行别名

相关文章：

python - 根据多索引的一部分选择 pandas 系列中的条目

python - pandas dataframe - 过滤会产生意外错误 'unorderable types'

python - 如何将数据类型从 python pandas 映射到 postgres 表？

python - Pandas Dataframe - 在特定行中选择具有特定值的列

scala - 在Spark DataFrame中对结构数组进行排序

python - numpy 函数 `array_split` 在数学上如何工作？

python - 将 float 写入文件时如何减少小数点后的位数？

python - numpy中另一个数组过滤数组元素

python - 将 Python 中的推特提要解析为表格

python - 将 pandas 数据帧传递给 fastapi