我正在研究 TextBlob 来计算我编译的 Excel 工作表上的文章列表的情绪分数(极性、主观性)。
下面是工作表的示例:
11/03/2004 04:03 At least 60 people were killed in three bomb attacks on crowded Madrid trains in Spain's worst-ever terrorist attack, said Efe newswire and other media. Red Cross said at least 200 people were injured. ``This is a massacre,'' said Socialist party leader Jose Luis Rodriguez Zapatero, who blamed Basque terrorist group ETA.
07/07/2005 04:41 London closed its subway system and evacuated all stations after emergency services were called to explosions in and around the financial district.
01/12/2009 04:00 American International Group, Inc. (AIG) today announced that it has closed two previously announced transactions with the Federal Reserve Bank of New York (FRBNY) that have reduced the debt AIG owes the FRBNY by $25 billion in exchange for the FRBNY’s acquisition of preferred equity interests in certain newly formed subsidiaries.
22/08/2013 11:38 NASDAQ shuts down for 3 hours due to a computer problem
通过单独执行每一行,我已经能够以最简单的方式使用 textblob:
analysis = TextBlob("NASDAQ shuts down for 3 hours due to a computer problem")
print(analysis.sentiment)
我想要导入包含日期和时间以及两列中的文章的 Excel 文件,然后继续循环每行以计算极性和主观性分数并将其保存在文件中。
我尝试以这种方式修改汤森路透新闻分析上的代码:
import pandas as pd
import numpy as np
from textblob import TextBlob
path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()
df['Polarity'] = np.nan
df['Subjectivity'] = np.nan
pd.options.mode.chained_assignment = None
for idx, articles in enumerate(df['articles'].values): # for each row in our df dataframe
sentA = TextBlob("articles") # pass the text only article to TextBlob to analyze
df['Polarity'].iloc[idx] = sentA.sentiment.polarity # write sentiment polarity back to df
df['Subjectivity'].iloc[idx] = sentA.sentiment.subjectivity # write sentiment subjectivity score back to df
df.head()
df.to_csv("out.csv", index=False)
代码无法正常工作...我没有得到任何分数。
关于如何完成此操作有什么建议吗?
我是 Python 的新手(我正在使用 Pycharm)。我主要在 Stata 和 Matlab 上编写代码。
请帮忙!
最佳答案
您应该将逻辑移至一个函数中,然后使用 pd.Series.map()
将该函数应用于 DataFrame 的每一行。使用 .map()
或 .apply()
比手动循环更快、更干净。
import pandas as pd
from textblob import TextBlob
path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()
# function to extract polarity and subjectivity from text
def process_text(text):
blob = TextBlob(text)
return blob.sentiemnt.polarity, blob.sentiment.subjectivity
# apply to each row of the 'articles' Series using the pd.Series.map method
df["polarity"], df["sentiment"] = zip(*df.articles.map(process_text))
df.head()
df.to_csv("out.csv", index=False)
免责声明:我尚未对此进行测试。
关于python - TextBlob - 循环文章以计算极性和主观性分数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54916010/