我的数据框有以下数据
callerid seq text
1236 2 I need to talk to x
1236 6 Issue 3 is this
1236 3 This is regarding abc
1236 5 Issue 2 is this
1236 4 Issue 1 is this
1236 1 Hi
1347 2 I need to talk to x
1347 6 Issue 3 is this
1347 3 This is regarding abc
1347 5 Issue 2 is this
1347 4 Issue 1 is this
1347 1 Hi
我需要按来电显示对数据进行分组,按序列排序,连接文本并写入另一个数据框
最终输出数据应该如下所示
callerid text
1236 Hi I need to talk to X This is regarding abc Issue 1 is this Issue 2 is this Issue 3 is this
1347 Hi I need to talk to X This is regarding abc Issue 1 is this Issue 2 is this Issue 3 is this
我尝试了以下代码
documentext = dataextract.sort_values(['callerid','seq']).groupby('callerid')
documenttext1 = documenttext[['callerid','text']]
documentext1 = (documenttext1.groupby('callerid')['text']
.apply(lambda x: ' '.join(set(x.dropna())))
.reset_index())
第一个语句没有给我完整的排序文本 这是我得到的输出
callerid seq text
1236 1 Hi
1236 2 I need to talk to x
1236 3 This is regarding abc
1347 1 Hi
1347 2 I need to talk to x
1347 3 This is regarding abc
感谢对此的任何帮助
提前致谢
最佳答案
正如你所猜测的,第一步是排序,第二步是分组。您可以使用 ' '.join
作为 aggfunc 来连接字符串。
(df.sort_values('seq')
.groupby('callerid', sort=False)['text']
.agg(' '.join)
.reset_index())
callerid text
0 1236 Hi I need to talk to x This is regarding abc I...
1 1347 Hi I need to talk to x This is regarding abc I...
您不应该对“seq”进行分组,因为您正在尝试跨它进行聚合。
关于python - 根据另一列的排序,使用 pandas GroupBy 连接字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56725840/