python - 使用 Pickle 文件进行主题分类。 Python

标签 python python-3.x scikit-learn topic-modeling

我正在尝试使用训练模型的 pickle 文件进行主题分类,但我遇到了错误“CountVectorizer - 词汇未安装”。有人可以指导我如何解决此错误吗?

训练数据集格式:

Topic   originalSentence 
Topic1  He has arrived with his sister's two young children.
Topic2  The Lowells have been living off the Colby fortune
Topic3  Fred and Janice Gage, who live off the Lowell  fortune, which would have gone to Alan Colby

我的训练代码:

import pandas as pd
from io import StringIO
from sklearn.feature_extraction.text import TfidfVectorizer,TfidfTransformer,CountVectorizer
from sklearn.model_selection import train_test_split
import numpy as np
import pickle

def train_model():
df = pd.read_csv('/Users/ra51646/Desktop/classification_training.csv')
df = df[pd.notnull(df['originalSentence'])]
df.columns = ['topic', 'originalSentence']
df['category_id'] = df['topic'].factorize()[0]
category_id_df = df[['topic', 'category_id']].drop_duplicates().sort_values('category_id')
category_to_id = dict(category_id_df.values)
id_to_category = dict(category_id_df[['category_id', 'topic']].values)
tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm='l2', encoding='latin-1', ngram_range=(1, 2), stop_words='english')
features = tfidf.fit_transform(df.originalSentence).toarray()
labels = df.category_id
X_train, X_test, y_train, y_test = train_test_split(df['originalSentence'], df['topic'], random_state = 0)
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
clf_SGD = SGDClassifier().fit(X_train_tfidf, y_train)
clf_inc = Incremental(clf_SGD)
final_model = clf_inc.fit(X_train_tfidf, y_train,classes=np.unique(y_train))
pickle.dump(final_model, open("/Users/ra51646/Desktop/Pickle/topic_classification.pkl","wb"))

(待解决的错误)我使用pickle文件进行主题分类的代码:

def find_topic1():
model = pickle.load(open("/Users/ra51646/Desktop/Pickle/topic_classification.pkl","rb"))
count_vect = CountVectorizer()
answer = model.predict(count_vect.transform(["Lindy and her family went camping in the Outback"]))
print(answer[0])
return answer

我收到错误NotFittedError:CountVectorizer - 词汇未安装。 在 find_topic 方法中。请帮我解决这个错误。如何使用我的 pickle 文件(训练模型)进行主题分类。

最佳答案

您可能缺少 CountVectorizer 的参数,该参数使 count_vect 变量独立于 pickled 模型,从而导致错误。没有 MCVE无法确定。

关于python - 使用 Pickle 文件进行主题分类。 Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52739532/

相关文章:

python - 如何让汽车到达赛道顶部

python - tkinter网格行没有出现并且没有传递到正确的功能

Python - 随机婴儿名字生成器问题 - (重复输入,调用变量)

python - Scikit-learn微调: Postprocess predicted labels before evaluation

python - 使用特定日期时间索引重新索引 Pandas Dataframe

python - 对 python dict 中的值求和,除了一个

python - 如何在Python中使用BeautifulSoup解析多个body标签中的文本?

python - 如何使用 Python 中的 numpy 和 matplotlib 在 Python 中的曲线内部绘制切线圆?

python - 为 RandomizedSearchCV 缩放 sklearn RandomForestClassifier

python - 删除词汇表 TF-IDF 中单次出现的单词