python - TextBlob - 循环文章以计算极性和主观性分数

标签 python textblob

我正在研究 TextBlob 来计算我编译的 Excel 工作表上的文章列表的情绪分数(极性、主观性)。

下面是工作表的示例:

11/03/2004 04:03 At least 60 people were killed in three bomb attacks on crowded Madrid trains in Spain's worst-ever terrorist attack, said Efe newswire and other media. Red Cross said at least 200 people were injured. ``This is a massacre,'' said Socialist party leader Jose Luis Rodriguez Zapatero, who blamed Basque terrorist group ETA.

07/07/2005 04:41 London closed its subway system and evacuated all stations after emergency services were called to explosions in and around the financial district.

01/12/2009 04:00 American International Group, Inc. (AIG) today announced that it has closed two previously announced transactions with the Federal Reserve Bank of New York (FRBNY) that have reduced the debt AIG owes the FRBNY by $25 billion in exchange for the FRBNY’s acquisition of preferred equity interests in certain newly formed subsidiaries.

22/08/2013 11:38 NASDAQ shuts down for 3 hours due to a computer problem

通过单独执行每一行,我已经能够以最简单的方式使用 textblob:

analysis = TextBlob("NASDAQ shuts down for 3 hours due to a computer problem")
print(analysis.sentiment)

我想要导入包含日期和时间以及两列中的文章的 Excel 文件,然后继续循环每行以计算极性和主观性分数并将其保存在文件中。

我尝试以这种方式修改汤森路透新闻分析上的代码:

import pandas as pd
import numpy as np
from textblob import TextBlob

path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()

df['Polarity'] = np.nan
df['Subjectivity'] = np.nan

pd.options.mode.chained_assignment = None

for idx, articles in enumerate(df['articles'].values):  # for each row in our df dataframe
    sentA = TextBlob("articles")  # pass the text only article to TextBlob to analyze
    df['Polarity'].iloc[idx] = sentA.sentiment.polarity  # write sentiment polarity back to df
    df['Subjectivity'].iloc[idx] = sentA.sentiment.subjectivity  # write sentiment subjectivity score back to df
df.head()

df.to_csv("out.csv", index=False)

代码无法正常工作...我没有得到任何分数。

关于如何完成此操作有什么建议吗?

我是 Python 的新手(我正在使用 Pycharm)。我主要在 Stata 和 Matlab 上编写代码。

请帮忙!

最佳答案

您应该将逻辑移至一个函数中,然后使用 pd.Series.map() 将该函数应用于 DataFrame 的每一行。使用 .map().apply() 比手动循环更快、更干净。

import pandas as pd
from textblob import TextBlob

path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()

# function to extract polarity and subjectivity from text
def process_text(text):
    blob = TextBlob(text)
    return blob.sentiemnt.polarity, blob.sentiment.subjectivity

# apply to each row of the 'articles' Series using the pd.Series.map method
df["polarity"], df["sentiment"] = zip(*df.articles.map(process_text))

df.head()

df.to_csv("out.csv", index=False)

免责声明:我尚未对此进行测试。

关于python - TextBlob - 循环文章以计算极性和主观性分数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54916010/

相关文章:

python - 如何使用 Google Cloud NL api 进行情绪分析?

python - 为什么 TextBlob 不使用/检测否定?

python-3.x - 在 ubuntu 14.04 上安装 textblob 时出错

php - 如何像facebook ticker、meetup.com主页那样显示连续的实时更新呢?

python - 以奇怪的角度发射的子弹pygame

python - 如何将在纯 python 中动态创建的按钮添加到用 Kivy 语言编写的 kivy 布局?

python textblob和文本分类

python - 使用 Motor AsyncIO 和 Pytest 测试 MongoDB 功能

python - Theano错误: no matching function for call to ‘batch_gemm<float>

python - 在 spacy 中执行 noun_chunks (或在 textblob 中执行 np_extractor )时,如何添加一些我已经知道的名词短语?