python - 如何计算数据框中每篇文章的段落数?

标签 python pandas text nlp

我想计算数据框中的段落数。然而,事实证明我的结果在列表中为零。有人知道如何修复它吗?非常感谢。

这是我的代码:

def count_paragraphs(df):
    paragraph_count = []
    linecount = 0
    for i in df.text:
        if i in ('\n','\r\n'):
            if linecount == 0:
                paragraphcount = paragraphcount + 1
    return paragraph_count

count_paragraphs(df)

df.text

0     On Saturday, September 17 at 8:30 pm EST, an e...
1     Story highlights "This, though, is certain: to...
2     Critical Counties is a CNN series exploring 11...
3     McCain Criticized Trump for Arpaio’s Pardon… S...
4     Story highlights Obams reaffirms US commitment...
5     Obama weighs in on the debate\n\nPresident Bar...
6     Story highlights Ted Cruz refused to endorse T...
7     Last week I wrote an article titled “Donald Tr...
8     Story highlights Trump has 45%, Clinton 42% an...
9     Less than a day after protests over the police...
10    I woke up this morning to find a variation of ...
11    Thanks in part to the declassification of Defe...
12    The Democrats are using an intimidation tactic...
13    Dolly Kyle has written a scathing “tell all” b...
14    The Haitians in the audience have some newswor...
15    The man arrested Monday in connection with the...
16    Back when the news first broke about the pay-t...
17    Chicago Environmentalist Scumbags\n\nLeftists ...
18    Well THAT’S Weird. If the Birther movement is ...
19    Former President Bill Clinton and his Clinton ...
Name: text, dtype: object

最佳答案

使用Series.str.count :

def count_paragraphs(df):
    return df.text.str.count(r'\n\n').tolist()
count_paragraphs(df)

关于python - 如何计算数据框中每篇文章的段落数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61452674/

相关文章:

python - 将字典转换为配置文件,然后从配置中恢复字典

text - 查找文本的确切边界

python - Django |加入的路径位于基本路径组件 {% static img.thumbnail.url %} 之外,错误 400 with whitenoise

python - 根据条件屏蔽 Pandas 数据框中的值

python - 如何将具有值作为字典的列拆分为新列(具有 NaN 值)

python - 将 RGB 颜色的 pandas 数据帧转换为十六进制

python - Django Rest Framework RelatedField 无法返回字典对象

python - 在目录中搜索 shapefile

python - 无法连接到 MySQL 服务器(连接被拒绝),尝试在 Django 和 MySQL 之间安装连接

html - 跨度显示在两行上的文本