我正在亚马逊评论上进行练习,下面是代码。 基本上我无法将列(pandas 数组)添加到应用 BoW 后得到的 CSR 矩阵。 即使两个矩阵中的行数匹配,我也无法通过。
import sqlite3
import pandas as pd
import numpy as np
import nltk
import string
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.metrics import roc_curve, auc
from nltk.stem.porter import PorterStemmer
from sklearn.manifold import TSNE
#Create Connection to sqlite3
con = sqlite3.connect('C:/Users/609316120/Desktop/Python/Amazon_Review_Exercise/database/database.sqlite')
filtered_data = pd.read_sql_query("""select * from Reviews where Score != 3""", con)
def partition(x):
if x < 3:
return 'negative'
return 'positive'
actualScore = filtered_data['Score']
actualScore.head()
positiveNegative = actualScore.map(partition)
positiveNegative.head(10)
filtered_data['Score'] = positiveNegative
filtered_data.head(1)
filtered_data.shape
display = pd.read_sql_query("""select * from Reviews where Score !=3 and Userid="AR5J8UI46CURR" ORDER BY PRODUCTID""", con)
sorted_data = filtered_data.sort_values('ProductId', axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')
final=sorted_data.drop_duplicates(subset={"UserId","ProfileName","Time","Text"}, keep='first', inplace=False)
final.shape
display = pd.read_sql_query(""" select * from reviews where score != 3 and id=44737 or id = 64422 order by productid""", con)
final=final[final.HelpfulnessNumerator<=final.HelpfulnessDenominator]
final['Score'].value_counts()
count_vect = CountVectorizer()
final_counts = count_vect.fit_transform(final['Text'].values)
final_counts.shape
type(final_counts)
positive_negative = final['Score']
#Below is giving error
final_counts = hstack((final_counts,positive_negative))
最佳答案
sparse.hstack
将输入的 coo
格式矩阵合并为新的 coo
格式矩阵。
final_counts
是一个 csr
矩阵,因此 sparse.coo_matrix(final_counts)
转换很简单。
positive_negative
是 DataFrame 的一列。看看
sparse.coo_matrix(positive_negative)
它可能是一个 (1,n) 稀疏矩阵。但要将其与 final_counts
组合,它需要为 (1,n) 形状。
尝试创建稀疏矩阵,并将其转置:
sparse.hstack((final_counts, sparse.coo_matrix(positive_negative).T))
关于pandas - hstack csr 矩阵与 pandas 数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51700979/