python - 获取字数大于 1 的索引值组

标签 python pandas dataframe pandas-groupby

我正在尝试获取索引值以及列表中每个单词的数量超过 1 的情况。

Ref="easy to get to know to easy of to"

通过 Ref 输入，我的表为 df1

 word   Count
 easy   2
  to    4
 get    1
 know   1
  of    1

而df是

Index   word
   0    easy
   1    to
   2    get
   3    to
   4    know
   5    to
   6    easy
   7    of
   8    to

所以从这两个表df和df1我想要的是

Index          word   count
[0,6]          easy     2
[1,3,5,8]       to      4
[2]             get     1
[4]            know     1
[7]             of      1

如果有人帮助我，那就太好了。

最佳答案

给定 df，为

       word
Index      
0      easy
1        to
2       get
3        to
4      know
5        to
6      easy
7        of
8        to

首先，使用 reset_index 将数据帧索引移动到名为“Index”的列中:

df = df.reset_index()

接下来使用以下groupby和agg:

df.groupby('word')['Index'].agg([list,'count']).reset_index()

输出:

   word          list  count
0  easy        [0, 6]      2
1   get           [2]      1
2  know           [4]      1
3    of           [7]      1
4    to  [1, 3, 5, 8]      4

关于python - 获取字数大于 1 的索引值组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51023435/

上一篇：Python - 时间数据与格式不匹配

下一篇：python - 从 numpy 数组中删除连续的 RGB 值

相关文章：

python - 永远运行服务器端脚本的方法有哪些？

python - 在另一个数据框中查找固定距离内的所有邻居

python - 在 Python 中并排连接列表中的单词

python - 在 python 中 reshape 一个 numpy 数组

python - 周期性操作错误: (2006, 'MySQL server has gone away')

python - NLP:根据给定语法验证句子

python - 如何从 Pandas 数据框中的时间戳列中删除时区

python - 根据 pandas 的条件计算天数差异

python - 来自字典/汇总 DataFrame 的 Pandas 系列

r - 使用数据框列的级别添加一个新列，每个级别都有唯一的递增编号