java - 保存和重新加载 Guava 布隆过滤器时出错 - 需要帮助查找代码中的任何错误

标签 java guava bloom-filter

我最近正在测试经典布隆过滤器的谷歌实现,然后再将其用于生产环境。我正在使用第 18 版的 Guava 库。当我运行以下程序时,我在 sysout 中得到超过 200 个不同的计数。我看不出这里出了什么问题,有人可以提供第二双眼睛吗?

import com.google.common.collect.Lists;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
import com.google.common.hash.Hashing;
import org.apache.commons.lang3.RandomStringUtils;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.*;

/**
 * http://code.google.com/p/guava-libraries/wiki/HashingExplained
 * stackoverflow.com/questions/12319560/how-should-i-use-guavas-hashingconsistenthash
 */
public class GuavaHashing {
    private static final int N = 2500;

    public static void main(String[] args) throws IOException {
        List<String> ids = generateStoryIds(N);
        Set<String> testIds = generateTest(ids);
        bloomfiltertime(ids, testIds);
    }

    private static List<String> generateStoryIds(int size) {
        List<String> stories = new ArrayList<>();
        for (int i=0; i<size; ++i) {
            stories.add(RandomStringUtils.randomAlphanumeric(16));
        }
        return stories;
    }

    private static Set<String> generateTest(List<String> presList) {
        Set<String> test = new HashSet<>();
        Random rand = new Random(System.currentTimeMillis());
        for (int i=0; i<200; ++i) {
            test.add(presList.get(Math.abs(rand.nextInt()%N)));
        }
        for (int i=0; i<250; ++i) {
            test.add(RandomStringUtils.randomAlphanumeric(16));
        }
        return test;
    }

    public static void bloomfiltertime(List<String> storyIds, Set<String> testPresent) throws IOException {
        BloomFilter<String> stories = BloomFilter.create(Funnels.stringFunnel(Charset.defaultCharset()), N, 0.05);
        long startTime = System.currentTimeMillis();
        for(String story : storyIds) {
            stories.put(story);
        }
        long endTime = System.currentTimeMillis();
        System.out.println("bloom put time " + (endTime - startTime));

        FileOutputStream fos = new FileOutputStream("testfile.dat");
        stories.writeTo(fos);
        fos.close();

        FileInputStream fis = new FileInputStream("testfile.dat");
        BloomFilter<String> readStories = BloomFilter.create(Funnels.stringFunnel(Charset.defaultCharset()), N, 0.05);
        startTime = System.currentTimeMillis();
        readStories.readFrom(fis, Funnels.stringFunnel(Charset.defaultCharset()));
        endTime = System.currentTimeMillis();
        System.out.println("bloom read file time " + (endTime - startTime));

        startTime = System.currentTimeMillis();
        int count = 0;
        for(String story : testPresent) {
            if(stories.mightContain(story) != readStories.mightContain(story)) {
                ++count;
            }
        }
        endTime = System.currentTimeMillis();
        System.out.println("bloom check time " + (endTime - startTime));
        System.out.println("varying : " + count);

    }
}

最佳答案

BloomFilter#readFrom method 是一个static 方法,它返回一个new BloomFilter 对象。您忽略了此返回值(并且显然假设此方法“填充”了调用它的对象)。

所以改变

BloomFilter<String> readStories = 
    BloomFilter.create(Funnels.stringFunnel(Charset.defaultCharset()), N, 0.05);
readStories.readFrom(fis, Funnels.stringFunnel(Charset.defaultCharset()));

BloomFilter<CharSequence> readStories = 
    BloomFilter.readFrom(fis, Funnels.stringFunnel(Charset.defaultCharset()));

它应该可以工作。

(顺便说一下:当您在实例上调用静态方法时,现代 IDE 会发出警告。例如,Eclipse:Window -> Preferences -> Java -> Compiler -> Errors/Warnings -> Code Style ->将“对静态成员的非静态访问”设置为“警告”)

关于java - 保存和重新加载 Guava 布隆过滤器时出错 - 需要帮助查找代码中的任何错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35505212/

相关文章:

java - 使用 javac 的错误类文件

java - 在 Java 中创建 JSON Web token

java - Guava 文件缓存

algorithm - 压缩布隆过滤器

c++ - 我在这个布隆过滤器实现中做错了什么?

安装jdk 7u9后无法识别javafx

java - 为 JSP/Tomcat/Windows 设置目录权限

java - java android中的ArrayList groupby基于没有java 8流和lambda的相同属性

java - 使用 Guava Graph 通过 ID 获取节点

Guava 布隆过滤器 : is there a limit for the number of expected insertions?