python - 当预测值是匹配的索引对时，如何实现 SVM 模型？

标签 python pandas scikit-learn multi-index record-linkage

我正在尝试拟合 SVM 模型，其中我的预测真实值是匹配的多索引。问题是我不知道如何指定多重索引是真实值。

我无法使用记录链接分类步骤，因为它不太灵活。

from sklearn.svm import SVC

golden_pairs = filter_tests_new_df[:training_value]
golden_matches_index = golden_pairs[golden_pairs['ev_2'] == 1].index 
# This is a multiindex type

svm = SVC(gamma='auto')
svm.fit(golden_pairs, golden_matches_index) 
# I dont know how to specify that the golden_matches_index are the good matches

# Predict the match status for all record pairs
result_svm = svm.predict(test_pairs[columns_to_keep])

最佳答案

您不必指定索引，而是使用生成的 bool Series作为分类的标签。

这是一个例子。

# Sample data
data = pd.DataFrame({'a': [1, 2, 3], 
                     'b': [1, 1, 0]})

data
   a  b
0  1  1
1  2  1
2  3  0

# Generating labels
data['b'] == 1
0     True
1     True
2    False
Name: b, dtype: bool

# Can convert them to integer if required
(data['b'] == 1).astype(int)
0    1
1    1
2    0
Name: b, dtype: int64

根据您的代码，我认为这应该可以解决问题

# Boolean
golden_pairs['ev_2'] == 1

# Integer
(golden_pairs['ev_2'] == 1).astype(int)

关于python - 当预测值是匹配的索引对时，如何实现 SVM 模型？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56669487/

上一篇：python - 为什么 MATLAB 在使用 "from tensorflow import keras"调用 python 脚本时会产生错误？

下一篇：python - 在healpy map 中进行分箱？

相关文章：

python - 设置 Pandas 数据框中的列顺序

python-2.7 - 使用 sklearn 进行 PCA 逆变换(白色=True)

python - 当 session.flush() 在 SQLAlchemy 上失败时，我应该调用回滚吗？

Python/Django object.filter(pk__in=variable_list)

python - Mac OS 上的 ibm_boto3 与 scikit-learn 的兼容性问题

python - 根据两列的值删除数据框 pandas 中的重复项

python - 从 geotiff 图像计算纬度和经度

python Pandas : Find the maximum for each row in a dataframe column containing a numpy array

python - 值错误: Found input variables with inconsistent numbers of samples: [4, 103]

python - 使用 scikit-learn 时出现属性错误