我一直在寻找使用 weka 实现 KNN 的实际示例,但我发现的所有内容对我来说都太笼统,无法理解它需要能够工作的数据(或者可能如何制作它需要的对象)工作)以及它显示的结果,也许之前使用过它的人有一个更好的例子,比如现实的事物(产品、电影、书籍等),而不是你在代数上看到的典型字母。
所以我可以弄清楚如何在我的案例中实现它(这是用 KNN 向活跃用户推荐菜肴),将不胜感激,谢谢。
我试图通过这个链接来理解https://www.ibm.com/developerworks/library/os-weka3/index.html但我什至不明白他们是如何得到这个结果以及他们是如何得到公式
第 1 步:确定距离公式
Distance = SQRT( ((58 - Age)/(69-35))^2) + ((51000 - Income)/(150000-38000))^2 )
为什么总是/(69-35) 和/(150000-38000) ?
编辑:
这是我尝试过但没有成功的代码,如果有人可以为我清除它,我很感激,我也通过结合这两个答案来完成此代码:
这个答案展示了如何获取 knn:
How to get the nearest neighbor in weka using java
这个告诉我如何创建实例(我真的不知道它们对于 weka 来说是什么)Adding a new Instance in weka
所以我想出了这个:
public class Wekatest {
public static void main(String[] args) {
ArrayList<Attribute> atts = new ArrayList<>();
ArrayList<String> classVal = new ArrayList<>();
// I don't really understand whats happening here
classVal.add("A");
classVal.add("B");
classVal.add("C");
classVal.add("D");
classVal.add("E");
classVal.add("F");
atts.add(new Attribute("content", (ArrayList<String>) null));
atts.add(new Attribute("@@class@@", classVal));
// Here in my case the data to evaluate are dishes (plato mean dish in spanish)
Instances dataRaw = new Instances("TestInstancesPlatos", atts, 0);
// I imagine that every instance is like an Object that will be compared with the other instances, to get its neaerest neightbours (so an instance is like a dish for me)..
double[] instanceValue1 = new double[dataRaw.numAttributes()];
instanceValue1[0] = dataRaw.attribute(0).addStringValue("Pizzas");
instanceValue1[1] = 0;
dataRaw.add(new DenseInstance(1.0, instanceValue1));
double[] instanceValue2 = new double[dataRaw.numAttributes()];
instanceValue2[0] = dataRaw.attribute(0).addStringValue("Tunas");
instanceValue2[1] = 1;
dataRaw.add(new DenseInstance(1.0, instanceValue2));
double[] instanceValue3 = new double[dataRaw.numAttributes()];
instanceValue3[0] = dataRaw.attribute(0).addStringValue("Pizzas");
instanceValue3[1] = 2;
dataRaw.add(new DenseInstance(1.0, instanceValue3));
double[] instanceValue4 = new double[dataRaw.numAttributes()];
instanceValue4[0] = dataRaw.attribute(0).addStringValue("Hamburguers");
instanceValue4[1] = 3;
dataRaw.add(new DenseInstance(1.0, instanceValue4));
double[] instanceValue5 = new double[dataRaw.numAttributes()];
instanceValue5[0] = dataRaw.attribute(0).addStringValue("Pizzas");
instanceValue5[1] = 4;
dataRaw.add(new DenseInstance(1.0, instanceValue5));
System.out.println("---------------------");
weka.core.neighboursearch.LinearNNSearch knn = new LinearNNSearch(dataRaw);
try {
// This method receives the goal instance which you wanna know its neighbours and N (I don't really know what N is but I imagine it is the number of neighbours I want)
Instances nearestInstances = knn.kNearestNeighbours(dataRaw.get(0), 1);
// I expected the output to be the closes neighbour to dataRaw.get(0) which would be Pizzas, but instead I got some data that I don't really understand.
System.out.println(nearestInstances);
} catch (Exception e) {
e.printStackTrace();
}
}
}
OUTPUT:
---------------------
@relation TestInstancesPlatos
@attribute content string
@attribute @@class@@ {A,B,C,D,E,F}
@data
Pizzas,A
Tunas,B
Pizzas,C
Hamburguers,D
使用的weka依赖:
<dependency>
<groupId>nz.ac.waikato.cms.weka</groupId>
<artifactId>weka-stable</artifactId>
<version>3.8.0</version>
</dependency>
最佳答案
KNN 是一种机器学习技术,通常被归类为“基于实例的预测器”。它获取分类样本的所有实例并将它们绘制在 n 维空间中。
使用欧几里得距离等算法,KNN 寻找这个 n 维空间中最近的点,并根据这些邻居估计它属于哪个类。如果它更接近蓝点,它是蓝色,如果它更接近红点......
但是现在,我们如何将其应用于您的问题?
想象一下,您只有两个属性:价格和卡路里(二维空间)。您希望将顾客分为三类:健康人群、垃圾食品人群、美食人群。这样,您就可以在餐厅中提供与客户偏好相似的优惠。
您有以下数据:
+-------+----------+-----------+
| Price | Calories | Food Type |
+-------+----------+-----------+
| $2 | 350 | Junk Food |
+-------+----------+-----------+
| $5 | 700 | Junk Food |
+-------+----------+-----------+
| $10 | 200 | Fit |
+-------+----------+-----------+
| $3 | 400 | Junk Food |
+-------+----------+-----------+
| $8 | 150 | Fit |
+-------+----------+-----------+
| $7 | 650 | Junk Food |
+-------+----------+-----------+
| $5 | 120 | Fit |
+-------+----------+-----------+
| $25 | 230 | Gourmet |
+-------+----------+-----------+
| $12 | 210 | Fit |
+-------+----------+-----------+
| $40 | 475 | Gourmet |
+-------+----------+-----------+
| $37 | 600 | Gourmet |
+-------+----------+-----------+
现在,让我们看看它在 2D 空间中的绘制:
接下来会发生什么?
对于每个新条目,算法都会计算到所有点(实例)的距离并找到 k 个最近的点。从这 k 个最接近的类别中,它定义了新条目的类别。
取 k = 3,值为 15 美元和 165 卡路里。让我们找到 3 个最近的邻居:
这就是距离公式出现的地方。它实际上对每个点进行计算。然后对这些距离进行“排名”,k 个最接近的距离构成最终类别。
现在,为什么值是/(69-35) 和/(150000-38000)?正如其他答案中提到的,这是由于标准化造成的。我们的示例使用价格和卡路里。正如所见,卡路里比金钱更重要(每个值有更多单位)。为了避免不平衡,例如卡路里的类别比价格更有值(value)(例如,这会杀死美食类别),需要使所有属性同样重要,因此需要使用标准化。
Weka 为您抽象了这一点,但您也可以将其可视化。请参阅我为 Weka ML 类(class)制作的项目中的可视化示例:
请注意,由于有很多多于 2 的维度,因此有很多图,但想法是相似的。
解释代码:
public class Wekatest {
public static void main(String[] args) {
//These two ArrayLists are the inputs of your algorithm.
//atts are the attributes that you're going to pass for training, usually called X.
//classVal is the target class that is to be predicted, usually called y.
ArrayList<Attribute> atts = new ArrayList<>();
ArrayList<String> classVal = new ArrayList<>();
//Here you initiate a "dictionary" of all distinct types of restaurants that can be targeted.
classVal.add("A");
classVal.add("B");
classVal.add("C");
classVal.add("D");
classVal.add("E");
classVal.add("F");
// The next two lines initiate the attributes, one made of "content" and other pertaining to the class of the already labeled values.
atts.add(new Attribute("content", (ArrayList<String>) null));
atts.add(new Attribute("@@class@@", classVal));
//This loads a Weka object of data for training, using attributes and classes from a file "TestInstancePlatos" (or should happen).
//dataRaw contains a set of previously labelled instances that are going to be used do "train the model" (kNN actually doesn't tain anything, but uses all data for predictions)
Instances dataRaw = new Instances("TestInstancesPlatos", atts, 0);
//Here you're starting new instances to test your model. This is where you can substitute for new inputs for production.
double[] instanceValue1 = new double[dataRaw.numAttributes()];
//It looks you only have 2 attributes, a food product and a rating maybe.
instanceValue1[0] = dataRaw.attribute(0).addStringValue("Pizzas");
instanceValue1[1] = 0;
//You're appending this new instance to the model for evaluation.
dataRaw.add(new DenseInstance(1.0, instanceValue1));
double[] instanceValue2 = new double[dataRaw.numAttributes()];
instanceValue2[0] = dataRaw.attribute(0).addStringValue("Tunas");
instanceValue2[1] = 1;
dataRaw.add(new DenseInstance(1.0, instanceValue2));
double[] instanceValue3 = new double[dataRaw.numAttributes()];
instanceValue3[0] = dataRaw.attribute(0).addStringValue("Pizzas");
instanceValue3[1] = 2;
dataRaw.add(new DenseInstance(1.0, instanceValue3));
double[] instanceValue4 = new double[dataRaw.numAttributes()];
instanceValue4[0] = dataRaw.attribute(0).addStringValue("Hamburguers");
instanceValue4[1] = 3;
dataRaw.add(new DenseInstance(1.0, instanceValue4));
double[] instanceValue5 = new double[dataRaw.numAttributes()];
instanceValue5[0] = dataRaw.attribute(0).addStringValue("Pizzas");
instanceValue5[1] = 4;
dataRaw.add(new DenseInstance(1.0, instanceValue5));
// After adding 5 instances, time to test:
System.out.println("---------------------");
//Load the algorithm with data.
weka.core.neighboursearch.LinearNNSearch knn = new LinearNNSearch(dataRaw);
//You're predicting the class of value 0 of your data raw values. You're asking the answer among 1 neighbor (second attribute)
try {
Instances nearestInstances = knn.kNearestNeighbours(dataRaw.get(0), 1);
//You will get a value among A and F, that are the classes passed.
System.out.println(nearestInstances);
} catch (Exception e) {
e.printStackTrace();
}
}
}
你应该怎么做?
-> Gather data.
-> Define a set of attributes that help you to predict which cousine you have (ex.: prices, dishes or ingredients (have one attribute for each dish or ingredient).
-> Organize this data.
-> Define a set of labels.
-> Manually label a set of data.
-> Load labelled data to KNN.
-> Label new instances by passing their attributes to KNN. It'll return you the label of the k nearest neighbors (good values for k are 3 or 5, have to test).
-> Have fun!
关于weka - 在哪里可以找到 Java 中使用 weka 的 KNN 实际示例,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57739084/