字符串列上的 Pandas 滚动总和

标签 pandas text rolling-sum

我将 Python3 与 Pandas 版本“0.19.2”一起使用。

我有一个 Pandas df 如下:

chat_id    line
1          'Hi.'
1          'Hi, how are you?.'
1          'I'm well, thanks.'
2          'Is it going to rain?.'
2          'No, I don't think so.'

我想按“chat_id”分组,然后在“line”上执行滚动总和之类的操作以获得以下结果:
chat_id    line                     conversation
1          'Hi.'                    'Hi.'
1          'Hi, how are you?.'      'Hi. Hi, how are you?.'
1          'I'm well, thanks.'      'Hi. Hi, how are you?. I'm well, thanks.'
2          'Is it going to rain?.'  'Is it going to rain?.'
2          'No, I don't think so.'  'Is it going to rain?. No, I don't think so.'

我相信 df.groupby('chat_id')['line'].cumsum() 只能在数字列上工作。

我也试过 df.groupby(by=['chat_id'], as_index=False)['line'].apply(list) 来获取完整对话中所有行的列表,但后来我想不通了解如何解压缩该列表以创建“滚动总和”样式的对话列。

最佳答案

为我工作 apply Series.cumsum , 如果需要分隔符添加 space :

df['new'] = df.groupby('chat_id')['line'].apply(lambda x: (x + ' ').cumsum().str.strip())
print (df)
   chat_id                   line                                          new
0        1                    Hi.                                          Hi.
1        1      Hi, how are you?.                        Hi. Hi, how are you?.
2        1      I'm well, thanks.      Hi. Hi, how are you?. I'm well, thanks.
3        2  Is it going to rain?.                        Is it going to rain?.
4        2  No, I don't think so.  Is it going to rain?. No, I don't think so.
df['line'] = df['line'].str.strip("'")
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: "'" + (x + ' ').cumsum().str.strip() + "'")
print (df)
   chat_id                   line  \
0        1                    Hi.   
1        1      Hi, how are you?.   
2        1      I'm well, thanks.   
3        2  Is it going to rain?.   
4        2  No, I don't think so.   

                                             new  
0                                          'Hi.'  
1                        'Hi. Hi, how are you?.'  
2      'Hi. Hi, how are you?. I'm well, thanks.'  
3                        'Is it going to rain?.'  
4  'Is it going to rain?. No, I don't think so.' 

关于字符串列上的 Pandas 滚动总和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43569056/

相关文章:

python - 使用 For 循环修改 Pandas 中的 DataFrame 字典

python - 初始化dask系列

python - 返回两个行值之间的值(伪时间序列?)

python - 如何在 groupby 和滚动总和之后创建一个包含值的新列?

r - R data.table 中带有阈值窗口的累积和

r 中数据帧中每列的最后 12 行的滚动总和

python - 有条件地创建(填充)一列,该列必须处理数据框中的行以匹配条件

java - JLabel、流程布局。将 JLabel 居中?

jquery脉动文本

html - 使用 Bootstrap (Nav-pills) 在 CSS 中更改文本颜色