python - 根据 pandas 中最后一次出现的字符串选择行

我有一个 pandas 数据框，看起来像这样，

id   desc
1    Description
1    02.09.2017 15:00 abcd
1    this is a sample description
1    which is continued here also
1    
1    Description
1    01.09.2017 12:00 absd
1    this is another sample description
1    which might be continued here
1    or here
1
2    Description
2    09.03.2017 12:00 abcd
2    another sample again
2    and again
2
2    Description
2    08.03.2017 12:00 abcd
2    another sample again
2    and again times two

基本上，有一个 ID，行包含非常非结构化格式的信息。我想提取最后一个“描述”行之后的描述并将其存储在 1 行中。生成的数据框看起来像这样:

id  desc
1   this is another sample description which might be continued here or here
2   another sample again and again times two

据我所知，我可能必须使用groupby，但我不知道之后该怎么做。

最佳答案

提取最后一个Description的位置并使用str.cat连接行

In [2840]: def lastjoin(x):
      ...:     pos = x.desc.eq('Description').cumsum().idxmax()
      ...:     return x.desc.loc[pos+2:].str.cat(sep=' ')
      ...:

In [2841]: df.groupby('id').apply(lastjoin)
Out[2841]:
id
1    this is another sample description which might...
2            another sample again and again times two
dtype: object

要拥有列，请使用reset_index

In [3216]: df.groupby('id').apply(lastjoin).reset_index(name='desc')
Out[3216]:
   id                                               desc
0   1  this is another sample description which might...
1   2          another sample again and again times two

关于python - 根据 pandas 中最后一次出现的字符串选择行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46771137/

上一篇：python - 元素对于 selenium python 永远不可见，但在 selenium IDE 中工作正常

下一篇：python - tensorflow keras 不使用所有可用资源

相关文章：

python - Pandas:将列中的列表扩展到不同的行

python - 重采样时间序列的中心日期时间

pandas - 根据值(重复)提取 Pandas 数据帧的子集？

python - 如何从现有数据框的某一列的前 10 名创建新的 pandas 数据框

pandas - 使用字典列将行添加到 pandas 数据框

python - Pandas:如何检查列是否包含值 0 然后根据某种规则对所选行数据进行排序？

python - 虽然声明未对 Selenium Webdriver 评估为 false

python - 使用 project.toml 配置 isort 和 autoflake

Python 字符串参数解析

python - 如何遍历一个字符串并转换成字典