python - 如何浏览数据框并对文本进行正面或负面分类?

标签 python pandas twitter text-mining

我目前有一个 pandas 数据框,其中包含标记化的推文。

我需要能够浏览每条推文并确定它是正面还是负面,以便我添加包含正面或负面单词的后续列。

示例数据:

tokenized_tweets =  ['football, was, good, we, played, well' , 'We, were, unlucky, today, bad, luck' , 'terrible, performance, bad, game'] 

我需要能够通过 tokenized_tweets 部分运行一个循环,以确定它是积极的还是消极的。

对于示例的情况,正面和负面词如下:

Positive_words = ['good', 'great'] 
Negative_words = ['terrible, 'bad']

所需的输出是一个数据帧,其中包含推文、每条推文包含多少个正面字母、每条推文包含多少个负面字母以及该推文是正面、负面还是中性。

需要根据推文是否包含更多积极或消极的流行语来确定积极消极和中性

期望的输出:

Tokenized tweet                    positive words       negative words         overall 
`football, was, good, we, played, well         1                0            positive` 

We, were, unlucky, today, bad, luck            0                1            negative
terrible, performance, bad, game               0                2            negative

最佳答案

import pandas as pd
import numpy as np

df = pd.DataFrame({'tokenized_tweets': ['football, was, good, we, played, well', 'We, were, unlucky, today, bad, luck','terrible, performance, bad, game']})

Positive_words = ['good', 'great'] 
Negative_words = ['terrible','bad']

df['positive words'] = df['tokenized_tweets'].str.count('|'.join(Positive_words))
df['negative words'] = df['tokenized_tweets'].str.count('|'.join(Negative_words))

conditions = [
(df['positive words'] > df['negative words']),
(df['negative words'] > df['positive words']),
(df['negative words'] == df['positive words'])
]

choices = [
'positive',
'negative',
'neutral'
]

df['overall'] = np.select(conditions, choices, default = '')

df

输出:

tokenized_tweets                      positive words   negative words   overall
0   football, was, good, we, played, well   1               0        positive
1   We, were, unlucky, today, bad, luck     0               1        negative
2   terrible, performance, bad, game        0               2        negative

关于python - 如何浏览数据框并对文本进行正面或负面分类?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49883501/

相关文章:

python - 如何在 alembic 中使用 alter_column?

python - Python : `key not in list` or `not key in list` ? 什么更高效

python - django存储文本编辑器输入值

Python:您能检查两个列值的唯一组合在另一个数据框中出现了多少次吗?

javascript - 如何使用 jQuery 取消缩短 t.co 链接?

ruby-on-rails - 如何使用 Test::Unit 在全局 stub http 请求?

Android Twitter Oauth 获取屏幕名称

python - 如何正确使用__setitem__?

python - pandas 中更惯用的 "if-else"替换

python - 如何从 Pandas Dataframe 创建事件图(如 Github 贡献图)