我使用数据挖掘技术(kmeans 聚类)编写了用于恶意软件检测的 Java 代码。我使用jnetpcap库来嗅探数据包来分析它,然后在方法nextpacket
中我为第一个数据包编写了kmeans聚类算法。
该算法运行良好,因为它从类 Instances
创建具有特定属性的对象,这些属性将基于它们进行聚类,但在下一个数据包上,此代码无法再次运行,因为它会引发异常.
我使用的代码是这样的:
for(int dim = 0; dim < numDimensions; dim++)
{
Attribute current = new Attribute("Attribute" + dim , dim);
if(dim == 0)
{
for(int obj = 0; obj < numInstances; obj++)
{
// instances.add(new SparseInstance(numDimensions));
instances.add(new DenseInstance(numDimensions) );
}
}
for(int obj = 0; obj < numInstances; obj++)
{
instances.get(obj).setValue(current, (Double)data[dim+1][obj]);
}
atts.add(current);
}
Instances newDataset = new Instances("Dataset" , atts, instances.size()); //this is the line that throws the exception
for(Instance inst : instances)
newDataset.add(inst);
SimpleKMeans kMeans = new SimpleKMeans();
kMeans.setNumClusters(2);
// kMeans.setMaxIterations(4);
kMeans.buildClusterer(newDataset);
// int clusterNumbers;
// clusterNumbers=kMeans.numberOfClusters();
for (int j=0;j<numInstances;j++)
{
int classif=kMeans.clusterInstance(newDataset.get(j));
// double []distr=kMeans.distributionForInstance(newDataset.firstInstance());
System.out.println(classif);
// System.out.println(distr[0]);
// System.out.println(distr[1]);
ArrayList<Double> temp5=flowFeatures.get((JFlowKey)data[0][j]);
if (classif==0)
{
// instances0.add(newDataset.get(j));
instance0FlowFeatures.put((JFlowKey)data[0][j], temp5);
}
else if(classif==1)
{
//instances1.add(newDataset.get(j));
instance1FlowFeatures.put((JFlowKey)data[0][j], temp5);
}
}
我看到的异常(exception)是:
java.lang.IllegalArgumentException: Attribute names are not unique! Causes: 'Attribute0' 'Attribute1' 'Attribute2' 'Attribute3' 'Attribute4' 'Attribute5' 'Attribute6' 'Attribute7' 'Attribute0' 'Attribute1' 'Attribute2' 'Attribute3' 'Attribute4' 'Attribute5' 'Attribute6' 'Attribute7'
有人可以帮我吗?
最佳答案
将属性视为表的列,并且必须创建一次。
这是单维数据的代码。 在此示例中,我的假想表有一列为“attr1”,并且表有 其中有3条记录(实例),结构非常简单易懂。
Attribute attr1 = new Attribute("attr1");
ArrayList<Attribute> attrList = new ArrayList<Attribute>();
attrList.add(attr1);
Instances dataset = new Instances("test", attrList, 0);
double[] val1 = new double[] { 1.2};
double[] val2 = new double[] { 2.2};
double[] val3 = new double[] { 1.4};
Instance instance0 = new DenseInstance(1.0, val1);
instance0.setDataset(dataset);
Instance instance1 = new DenseInstance(1.0, val2);
instance1.setDataset(dataset);
Instance instance2 = new DenseInstance(1.0, val3);
instance2.setDataset(dataset);
dataset.add(instance0);
dataset.add(instance1);
dataset.add(instance2);
SimpleKMeans kmeans = new SimpleKMeans();
try {
kmeans.setPreserveInstancesOrder(true);
kmeans.setNumClusters(2);
kmeans.setSeed(2);
kmeans.setDontReplaceMissingValues(true);
kmeans.buildClusterer(dataset);
kmeans.setMaxIterations(10);
Instances instances = kmeans.getClusterCentroids();
int assignments[] = kmeans.getAssignments();
int x=0;
for(int assignment : assignments) {
System.out.println("data :" + dataset.get(x) + "instance idx: " + x + " centroid value: " + instances.get(assignment));
x++;
}
}
关于java - weka java代码kmeans聚类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28201731/