python-3.x - 使用 NLP 模型查找该语句中存在的特定对象

标签 python-3.x machine-learning deep-learning data-analysis nlp

我对机器学习很陌生,因为使用自然语言处理开发模型。在该模型用户中,用户发送请求预订包含电视、空调和其他设施的酒店。所以我想建立模型来读取该请求的内容并区分是否需要特定的电视。通过使用 NLP 模型,我需要提高准确性。

针对上述问题开发了模型,但准确性较低。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('hotel.tsv', delimiter = '\t', 
quoting = 3)

# Cleaning the texts
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
corpus = []
for i in range(0, 130):
    review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])
    review = review.lower()
    review = review.split()
    ps = PorterStemmer()
    review = [ps.stem(word) for word in review if not word in 
    set(stopwords.words('english'))]
    review = ' '.join(review)
    corpus.append(review)

# Creating the Bag of Words model
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500)
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size 
 =   0.20, random_state = 0)

# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

我想提高模型的准确性,所以请帮助我。我怎样才能提高模型的准确性。或任何不同的想法。

最佳答案

#i developed this for review purpose of hotel and it gave an accuracy of 90%
#i have used ann deep learning with nlp
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


def negate(text):
    negation = False
    result = []
    prev = None
    pprev = None
    for word in text:
        negated = "not_" + word if negation else word
        result.append(negated)


        if any(neg in word for neg in ["not", "n't", "no"]):
            negation = True
        else:
            negation=False
    return result

导入数据集

dataset = pd.read_csv('Restaurant_Reviews.tsv',delimiter="\t",quoting=3)

import re
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
corpus=[]
from autocorrect import spell
for i in range(1000):
    review=re.sub('[^a-zA-Z]',' ',dataset.values[i,0])
    review=review.lower()
    review=review.split()
    ps=PorterStemmer()
    review=negate(review)
    review=[(ps.stem(word)) for word in review if not word in           
    review=" ".join(review)
    corpus.append(review)

from sklearn.feature_extraction.text import CountVectorizer
cv=CountVectorizer(max_features=1500)
X=cv.fit_transform(corpus).toarray()
y=dataset.iloc[:,1].values
lm=cv.vocabulary_

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout

classifier=Sequential()
classifier.add(Dense(50,input_shape=(1500,),kernel_initializer='uniform',activation='relu'))
classifier.add(Dropout(rate=0.45))
classifier.add(Dense(30,kernel_initializer='uniform',activation='relu'))
classifier.add(Dropout(rate=0.45))
classifier.add(Dense(1,kernel_initializer='uniform',activation='sigmoid'))
classifier.compile(optimizer="adam",loss="binary_crossentropy",metrics=["accuracy"])

classifier.fit(X_train,y_train,batch_size=32,epochs=50)
y_pred=classifier.predict(X_test)

y_pred=list(map(int,(y_pred>0.5)))
y_pred=np.reshape(y_pred,(200,))

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)


#to test the model
review="i good to hang here"
review=re.sub('[^a-zA-Z]',' ',review)
review=review.lower()
review=review.split()
ps=PorterStemmer()
review=negate(review)
review=[ps.stem(word) for word in review if not word in set(stopwords.words('english'))]
review=" ".join(review)

k=cv.transform([review]).toarray()
tt=classifier.predict(k)

关于python-3.x - 使用 NLP 模型查找该语句中存在的特定对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56563871/

相关文章:

python - 生成给定文件夹的文件名列表

r - 如何使用插入符比较不同的模型,调整不同的参数?

python - 多尺度 CNN 网络 Python Keras

machine-learning - 反向传播算法如何处理不可微的激活函数?

python-3.x - SyntaxError : invalid syntax : except urllib2. HTTPError,e:

python - 我只想将 kivy 的 MapView 放入我的应用程序的屏幕中,通过初始菜单上的按钮进行访问

machine-learning - Google Cloud机器学习引擎何时进行预测,将 'input.json file'放在哪里?

python - 如何获得神经网络中权重与损失的凸曲线

Python:将字典存储在多个 json 文件中

python - scikit-learn 中 SVC 分类器的预测错误?