我已经尝试过有关此主题的其他帖子,但似乎找不到正确的解决方案。
我有一个数据框,它描述了由说话者分隔的对话:
import pandas as pd
data = [[1, 'hello'], [2, 'hi there'], [1, 'how are you?'],[2, 'i am well'], [2, 'how are you?']]
df = pd.DataFrame(data, columns = ['speaker', 'turn'])
我要做的是合并存在相同扬声器标签的相邻行。换句话说,我希望能够合并最后两行,因为它们都应该算作同一个对话回合。
data = [[1, 'hello'], [2, 'hi there'], [1, 'how are you?'],[2, 'i am well', 'how are you?']
我怀疑答案与 groupby 函数有关,但到目前为止我尝试让它工作但没有奏效。
最佳答案
Pandas 中的字符串处理不当;这些操作可能看起来 是矢量化的,但实际上并非如此。在任何情况下,您要做的就是在此阶段聚合列表,并且该格式也不太适合您期望标量值的 df。使用 itertools.groupby
import itertools
from operator import itemgetter
data = [[1, 'hello'], [2, 'hi there'], [1, 'how are you?'],[2, 'i am well'],
[2, 'how are you?']]
rebuilt_list = []
for speaker, comment_group in itertools.groupby(data, itemgetter(0)):
comments = [speaker] # To make sure you have the speaker id as first value
for comment in comment_group:
comments.extend(comment[1:])
rebuilt_list.append(comments)
关于python - 根据条件合并相邻行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59195992/