我有一个 df 看起来像这样:
words col_a col_b
I guess, because I have thought over that. Um, 1 0
That? yeah. 1 1
I don't always think you're up to something. 0 1
我想在出现标点字符的任何地方拆分 df.words
(.,?!:;)
成一个单独的行。但是,我想为每个新行保留原始行中的 col_b 和 col_b 值。例如,上面的 df 应该是这样的:words col_a col_b
I guess, 1 0
because I have thought over that. 1 0
Um, 1 0
That? 1 1
yeah. 1 1
I don't always think you're up to something. 0 1
最佳答案
一种方法是使用 str.findall
带图案(.*?[.,?!:;])
匹配任何这些标点符号和它前面的字符(非贪婪),并分解结果列表:
(df.assign(words=df.words.str.findall(r'(.*?[.,?!:;])'))
.explode('words')
.reset_index(drop=True))
words col_a col_b
0 I guess, 1 0
1 because I have thought over that. 1 0
2 Um, 1 0
3 That? 1 1
4 yeah. 1 1
5 I don't always think you're up to something. 0 1
关于python - 如何通过标点符号拆分 Pandas 列中的长字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61331415/