java - 计算文档中字符串的唯一出现次数

标签 java arraylist

我正在将日志文件读取到 java 中。对于日志文件中的每一行,我都会检查该行是否包含 IP 地址。如果该行包含 IP 地址,我想将该 IP 地址在日志文件中出现的次数+1。我怎样才能在Java中完成这个任务?

下面的代码成功地从包含 ip 地址的每一行中提取 ip 地址,但计算 ip 地址出现次数的过程不起作用。

void read(String fileName) throws IOException {
    BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)));
    int counter = 0;
    ArrayList<IPHolder> ips = new ArrayList<IPHolder>();
    try {
        String line;
        while ((line = br.readLine()) != null) {
            if(!getIP(line).equals("0.0.0.0")){
                if(ips.size()==0){
                    IPHolder newIP = new IPHolder();
                    newIP.setIp(getIP(line));
                    newIP.setCount(0);
                    ips.add(newIP);
                }
                for(int j=0;j<ips.size();j++){
                    if(ips.get(j).getIp().equals(getIP(line))){
                        ips.get(j).setCount(ips.get(j).getCount()+1);
                    }else{
                        IPHolder newIP = new IPHolder();
                        newIP.setIp(getIP(line));
                        newIP.setCount(0);
                        ips.add(newIP);
                    }
                }
                if(counter % 1000 == 0){System.out.println(counter+", "+ips.size());}
                counter+=1;
            }
        }
    } finally {br.close();}
    for(int k=0;k<ips.size();k++){
        System.out.println("ip, count: "+ips.get(k).getIp()+" , "+ips.get(k).getCount());
    }
}

public String getIP(String ipString){//extracts an ip from a string if the string contains an ip
    String IPADDRESS_PATTERN = 
    "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)";

    Pattern pattern = Pattern.compile(IPADDRESS_PATTERN);
    Matcher matcher = pattern.matcher(ipString);
    if (matcher.find()) {
        return matcher.group();
    }
    else{
        return "0.0.0.0";
    }
}

持有者类别是:

public class IPHolder {

    private String ip;
    private int count;

    public String getIp(){return ip;}
    public void setIp(String i){ip=i;}

    public int getCount(){return count;}
    public void setCount(int ct){count=ct;}
}

最佳答案

本例中要搜索的关键字是 HashMap。 HashMap 是键值对的列表(在本例中是 ip 及其计数对)。

"192.168.1.12" - 12
"192.168.1.13" - 17
"192.168.1.14" - 9

等等。 与总是迭代容器对象数组以查明是否已经存在该 IP 的容器相比,使用和访问要容易得多。

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(/*Your file */)));

HashMap<String, Integer> occurrences = new HashMap<String, Integer>();

String line = null;

while( (line = br.readLine()) != null) {

    // Iterate over lines and search for ip address patterns
    String[] addressesFoundInLine = ...;


    for(String ip: addressesFoundInLine ) {

        // Did you already have that address in your file earlier? If yes, increase its counter by 
        if(occurrences.containsKey(ip))
            occurrences.put(ip, occurrences.get(ip)+1);

        // If not, create a new entry for this address
        else
            occurrences.put(ip, 1);
    } 
}


// TreeMaps are automatically orered if their elements implement 'Comparable' which is the case for strings and integers
TreeMap<Integer, ArrayList<String>> turnedAround = new TreeMap<Integer, ArrayList<String>>();

Set<Entry<String, Integer>> es = occurrences.entrySet();

// Switch keys and values of HashMap and create a new TreeMap (in case there are two ips with the same count, add them to a list)
for(Entry<String, Integer> en: es) {

    if(turnedAround.containsKey(en.getValue()))         
        turnedAround.get(en.getValue()).add((String) en.getKey());
    else {
        ArrayList<String> ips = new ArrayList<String>();
        ips.add(en.getKey());
        turnedAround.put(en.getValue(), ips);
    }

}

// Print out the values (if there are two ips with the same counts they are printed out without an special order, that would require another sorting step)
for(Entry<Integer, ArrayList<String>> entry: turnedAround.entrySet()) {         
    for(String s: entry.getValue())
        System.out.println(s + " - " + entry.getKey());         
}

就我而言,输出如下:

192.168.1.19 - 4
192.168.1.18 - 7
192.168.1.27 - 19
192.168.1.13 - 19
192.168.1.12 - 28

我回答了this question大约半小时前,我想这正是您正在寻找的内容,因此如果您需要一些示例代码,请看一下。

关于java - 计算文档中字符串的唯一出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27325368/

相关文章:

java - 泽西客户端 : Cache-Manager for Conditional GET?

java - arraylist java编译错误

java - 从图像转换为 GameObject 时,List 将最终元素复制到所有其他元素

java - 在我的 ArrayList 的 ArrayList 中,为什么对单个元素的所有操作都会影响所有其他索引?

java - 如何判断java中空列表的内容类型

java - 将 JTextField 格式化为最多接受三位数字,但最多可以输入 1-3 位数字

java - HTTP 状态 400 - 客户端发送的请求在语法上不正确。 - 在使用 Hibernate 的 Spring MVC Web 应用程序中

Java Swing - 为什么这不再是可拖动的?

java - 保存 List<Integer> 的状态

Java:数组索引越界