python - 如何统计每个单词在句子中出现的次数,得到每个句子的分数?

标签 python nlp sentiment-analysis

我有一份用户调查文档:

Score    Comment
8        Rapid bureaucratic affairs. Reports for policy...
4        There needs to be communication or feed back f...
7        service is satisfactory
5        Good
5        There is no
10       My main reason for the product is competition ...
9        Because I have not received the results. And m...
5        no reason

我想确定哪些关键字对应较高的分数,哪些关键字对应较低的分数。

我的想法是构建一个单词表(或“单词向量”字典),其中包含与其关联的分数,以及该分数与该句子关联的次数。

类似于以下内容:

Word        Score   Count
Word1:      7       1
            4       2
Word2:      5       1
            9       1
            3       2
            2       1
Word3:      9       3
Word4:      8       1
            9       1
            4       2
...         ...     ...

然后,对于每个单词,平均分数是与该单词关联的所有分数的平均值。

为此,我的代码如下:

word_vec = {}
# col 1 is the word, col 2 is the score, col 3 is the number of times it occurs

for i in range(len(data)):
    sentence = data['SurveyResponse'][i].split(' ')
    for word in sentence:
        word_vec['word'] = word
        if word in word_vec:
            word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':(word_vec[word]['NumberOfTimes'] += 1)}
        else:
            word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':1}

但是这段代码给了我以下错误:

File "<ipython-input-144-14b3edc8cbd4>", line 9
    word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':(word_vec[word]['NumberOfTimes'] += 1)}
                                                                                                  ^
SyntaxError: invalid syntax

有人可以告诉我正确的方法吗?

最佳答案

试试这段代码

word_vec = {}
# col 1 is the word, col 2 is the score, col 3 is the number of times it occurs

for i in range(len(data)):
    sentence = data['SurveyResponse'][i].split(' ')
    for word in sentence:
        word_vec['word'] = word
        if word in word_vec:
            word_vec[word]['Score'] += data['SCORE'][i] # Keep accumulating the total score for each word, would be easier to find the average score later on
            word_vec[word]['NumberOfTimes'] += 1
        else:
            word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':1}

要增加 'NumberOfTimes' 的值,可以像这样直接增加 word_vec[word]['NumberOfTimes'] += 1

关于python - 如何统计每个单词在句子中出现的次数,得到每个句子的分数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51185830/

相关文章:

python - 将三个一维数组组合成一个二维数组?

python - 如何从 python 中的模糊图像中找到扭曲矩形的准确角位置?

machine-learning - 评论集中的前 m 个主题

python - 为什么 wordnet 中的 NLTK wn.all_synsets() 函数不返回同义词集列表?

python - 使用azure进行情感分析错误 'Resource not found'

java - Weka CSV 加载器限制

python - NumPy:从记录的数值数组中获取最小值/最大值

keras 和 nlp - 何时使用 .texts_to_matrix 而不是 .texts_to_sequences?

python - Vader SentimentIntensityAnalyzer 是多语言的吗?

python - python脚本中开始保存rx二进制文件的时间?