python - 如何在 pandas 数据框中堆叠 wthin 来执行其引用?

标签 python string python-3.x pandas

我有一个包含大量文档的大型 pandas 数据框:

    id  text
1   doc2    Google i...
2   doc3    Amazon...
3   doc4    This was...
...
n   docN    nice camara...

如何将所有文档堆叠成执行各自id的句子?:

    id  text
1   doc1   Google is a great company.
2   doc1   It is in silicon valley.
3   doc1   Their search engine is the best
4   doc2   Amazon is a great store.
5   doc2   it is located in Seattle.
6   doc2   its new product is alexa. 
5   doc2   its expensive.
5   doc3   This was a great product.
...
n   docN   nice camara I really liked it.

我尝试过:

import nltk
def sentence(document):
    sentences = nltk.sent_tokenize(document.strip(' '))
    return sentences


df['sentece'] = df['text'].apply(sentence)
df.stack(level=0)

然而,这并没有奏效。知道如何堆叠句子来执行它们的出处吗?

最佳答案

这里有一个与您类似的问题的解决方案:pandas: When cell contents are lists, create a row for each element in the list 。这是我对您的特定任务的解释:

df['sents'] = df['text'].apply(lambda x: nltk.sent_tokenize(x))
s = df.apply(lambda x: pd.Series(x['sents']), axis=1).stack().\
                                 reset_index(level=1, drop=True)
s.name = 'sents'
df = df.drop(['sents','text'], axis=1).join(s)

关于python - 如何在 pandas 数据框中堆叠 wthin 来执行其引用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41472234/

相关文章:

Ruby:从数组/字符串匹配数据中收集索引

c++ - Mac OS 上 Pyfasttest 安装失败 : fatal error: 'random' file not found

python - 如何在 PyQt5 中从头开始制作按钮?

python - 如何使用信号处理?

python - 如何将带有反斜杠的字符串转换为json

java - 如何获取一个字符串并将其作为变量......?我什至不知道如何表达

sql - Postgres - 关于值是否包含特定字符串的 WHERE 子句

python - StringIO 初始值必须是 str,而不是 Bytes

python - Pandas 中 Dataframe 的多列分组和求和

python - 如何计算两个地理坐标的距离?