我有一个 pandas 数据框,看起来像这样,
id desc
1 Description
1 02.09.2017 15:00 abcd
1 this is a sample description
1 which is continued here also
1
1 Description
1 01.09.2017 12:00 absd
1 this is another sample description
1 which might be continued here
1 or here
1
2 Description
2 09.03.2017 12:00 abcd
2 another sample again
2 and again
2
2 Description
2 08.03.2017 12:00 abcd
2 another sample again
2 and again times two
基本上,有一个 ID,行包含非常非结构化格式的信息。我想提取最后一个“描述”行之后的描述并将其存储在 1 行中。生成的数据框看起来像这样:
id desc
1 this is another sample description which might be continued here or here
2 another sample again and again times two
据我所知,我可能必须使用groupby,但我不知道之后该怎么做。
最佳答案
提取最后一个Description
的位置并使用str.cat
连接行
In [2840]: def lastjoin(x):
...: pos = x.desc.eq('Description').cumsum().idxmax()
...: return x.desc.loc[pos+2:].str.cat(sep=' ')
...:
In [2841]: df.groupby('id').apply(lastjoin)
Out[2841]:
id
1 this is another sample description which might...
2 another sample again and again times two
dtype: object
要拥有列,请使用reset_index
In [3216]: df.groupby('id').apply(lastjoin).reset_index(name='desc')
Out[3216]:
id desc
0 1 this is another sample description which might...
1 2 another sample again and again times two
关于python - 根据 pandas 中最后一次出现的字符串选择行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46771137/