java - 在java中将SUBSTRING转换为整数的最佳方法

标签 java string integer type-conversion

在java中,在不使用Integer.parseInt的情况下将子字符串转换为整数的最快方法是什么?我想知道是否有办法避免 parseInt,因为它需要我创建一个临时字符串,它是我想要转换的子字符串的副本。

"abcd12345abcd"  <-- just want chars 4..8 converted.

I would like to avoid making a new temp string by not using substring.

If I were to roll my own, is there a way to avoid the overhead of the array bounds checking i see inside String.charAt(int)?

EDIT

I got a lot of good information from everyone...and the usual warnings about pre-optimization :) The basic answer is that there is nothing better than String.charAt or char[]. Unsafe code is on the way out (maybe). It is likely that the compiler can optimize away excessive range checking on [].

I did some benchmarking, and the savings due to not using substring and rolling a specific parseInt are huge.

32% of the cost of calling Integer.parseInt(str.substring(4,8)) comes from the substring. this does not include subsequent garbage collection costs.

Integer.parseInt is designed to handle a very wide set of inputs. By rolling my own parseInt (specific to what our data looks like) using charAt, I was able to achieve a 6x speedup over the substring method.

The comment to try char[] lead to a performance increase of about 7x. However your data must already be in a char[] as the cost to convert to a char array is high. For parsing text, it seems like it makes sense to stay entirely within char[] and write a few functions to compare strings.

Benchmark results (smaller is faster):

parseInt(substring)  23731665
parseInt(string)     16859226
Atoi1                 7116633
Atoi2                 4514031
Atoi3 char[]          4135355
Atoi4 char[]          3503638
Atoi5 char[]          5485495
GetNumber1            8666020
GetNumber2            5951939

在基准测试期间,我还尝试了打开和关闭内联,并验证编译器是否正确内联了所有内容。

如果有人关心的话,这是我的基准测试代码...

package javaatoi;

import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;

public class JavaAtoi {

    static int cPasses = 10;
    static int cTests = 9;
    static int cIter = 0x100000;
    static int cString = 0x100;
    static int fStringMask = cString - 1;

    public static void main(String[] args) throws InterruptedException {

        // setup test data.  Use a large enough set that the compiler 
        // wont unroll the loop.  Use a small enough set that we are 
        // keeping the data in L2.  I don't want to measure memory loads.

        String[] a = new String[cString];
        for (int i = 0 ; i< cString ; i+=4) {
            // leading zeros will occur, so add one number with one.
            a[i+0] = "abcd01234abcd";
            a[i+1] = "abcd1234abcd";
            a[i+2] = "abcd1234abcd";
            a[i+3] = "abcd1234abcd";
        }

        // array of pre-substringed stuff
        String[] a1 = new String[cString];
        for (int i=0 ; i< cString ; ++i)
            a1[i]= a[i].substring(4,8);

        // char array version of the strings
        char[][] b = new char[cString][];
        for (int i =0 ; i<cString ; ++i)
            b[i] = a[i].toCharArray();

        // array to hold times for each test for each pass
        long[][] t = new long[cPasses][cTests];

        // multiple dry runs to let the compiler optimize the functions
        for (int i=0 ; i<50 ; ++i) {
          t[0][0] = TestParseInt1(a)[0];
          t[0][1] = TestParseInt2(a1)[0];
          t[0][2] = TestAtoi1(a)[0];
          t[0][3] = TestAtoi2(a)[0];
          t[0][4] = TestAtoi3(b)[0];
          t[0][5] = TestAtoi4(b)[0];
          t[0][6] = TestAtoi5(b)[0];
          t[0][7] = TestAtoi6(a)[0];
          t[0][8] = TestAtoi7(a)[0];
        }

        // now do a bunch of tests
        for (int i=0 ; i<cPasses ; ++i) {
            t[i][0] = TestParseInt1(a)[0];
            t[i][1] = TestParseInt2(a1)[0];
            t[i][2] = TestAtoi1(a)[0];
            t[i][3] = TestAtoi2(a)[0];
            t[i][4] = TestAtoi3(b)[0];
            t[i][5] = TestAtoi4(b)[0];
            t[i][6] = TestAtoi5(b)[0];
            t[i][7] = TestAtoi6(a)[0];
            t[i][8] = TestAtoi7(a)[0];
        }

        // setup mins - we only care about min time.
        t[cPasses-1] = new long[cTests];
        for (int i=0 ; i<cTests ; ++i)
            t[cPasses-1][i] = 999999999;
        for (int j=0 ; j<cTests ; ++j) {
            for (int i=0 ; i<cPasses-1 ; ++i) {
                long n = t[i][j];
                if (n < t[cPasses-1][j])
                    t[cPasses-1][j] = n;
            }
        }

        // output string
        String s = new String();
        for (int j=0 ; j<cTests ; ++j) {
            for (int i=0 ; i<cPasses ; ++i) {
                long n = t[i][j];
                s += String.format("%9d", n);
            }
            s += "\n";
        }
        System.out.println(s);

        // if you comment out the part of TestParseInt1 you can sorta see the 
        // gc cost.
        System.gc(); // Trying to get an idea of the total substring cost
        Thread.sleep(1000);  // i dunno if this matters.  Seems like the gc takes a little while.  Not real exact...

        long collectionTime = 0;
        for (GarbageCollectorMXBean garbageCollectorMXBean : ManagementFactory.getGarbageCollectorMXBeans()) {
            long n = garbageCollectorMXBean.getCollectionTime();
            if (n > 0) 
                collectionTime += n;
        }

        System.out.println(collectionTime*1000000);
    }

   // you have to put each test function in its own wrapper to 
   // get the compiler to fairly optimize each test.
   // I also made sure I incremented n and used a large # of string
   // to make it harder for the compiler to eliminate the loops.

    static long[] TestParseInt1(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        // comment this out to get an idea of gc cost without the substrings
        // then uncomment to get idea of gc cost with substrings
        for (int i=0 ; i<cIter ; ++i) 
            n += Integer.parseInt(a[i&fStringMask].substring(4,8));
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestParseInt2(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Integer.parseInt(a[i&fStringMask]);
        return new long[] { System.nanoTime() - startTime, n };
    }


    static long[] TestAtoi1(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi1(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi2(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi2(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi3(char[][] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi3(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi4(char[][] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi4(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi5(char[][] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi5(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi6(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi6(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi7(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi7(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static int Atoi1(String s, int i0, int cb) {
        int n = 0;
        boolean fNeg = false;   // for unsigned T, this assignment is removed by the optimizer
        int i = i0;
        int i1 = i + cb;
        int ch;
        // skip leading crap, scan for -
        for ( ; i<i1 && ((ch = s.charAt(i)) > '9' || ch <= '0') ; ++i) {
            if (ch == '-') 
                fNeg = !fNeg;
        }
        // here is the loop to process the valid number chars.
        for ( ; i<i1 ; ++i) 
            n = n*10 + (s.charAt(i) - '0'); 
        return (fNeg) ? -n : n;
    }

    static int Atoi2(String s, int i0, int cb) {
        int n = 0;
        for (int i=i0 ; i<i0+cb ; ++i) {
            char ch = s.charAt(i);
            n = n*10 + ((ch <= '0') ? 0 : ch - '0');
        }
        return n;
    }

    static int Atoi3(char[] s, int i0, int cb) {
        int n = 0, i = i0, i1 = i + cb;
        // skip leading spaces or zeros
        for ( ; i<i1 && s[i] <= '0' ; ++i) { }
        // loop to process the valid number chars.
        for ( ; i<i1 ; ++i) 
            n = n*10 + (s[i] - '0');
        return n;
    }   

    static int Atoi4(char[] s, int i0, int cb) {
        int n = 0;
        // loop to process the valid number chars.
        for (int i=i0 ; i<i0+cb ; ++i) {
            char ch = s[i];
            n = n*10 + ((ch <= '0') ? 0 : ch - '0');
        }
        return n;
    }   

    static int Atoi5(char[] s, int i0, int cb) {
        int ch, n = 0, i = i0, i1 = i + cb;
        // skip leading crap or zeros
        for ( ; i<i1 && ((ch = s[i]) <= '0' || ch > '9') ; ++i) { }
        // loop to process the valid number chars.
        for ( ; i<i1 && (ch = s[i] - '0') >= 0 && ch <= 9 ; ++i) 
            n = n*10 + ch;
        return n;
    }   

    static int Atoi6(String data, int start, int length) {
        int number = 0;
        for (int i = start; i <= start + length; i++) {
            if (Character.isDigit(data.charAt(i))) {
                number = (number * 10) + (data.charAt(i) - 48);
            }
        }       
        return number;
    }

    static int Atoi7(String data, int start, int length) {
        int number = 0;
        for (int i = start; i <= start + length; i++) {
            char ch = data.charAt(i);
            if (ch >= '0' && ch <= '9') {
                number = (number * 10) + (ch - 48);
            }
        }       
        return number;
    }

}

最佳答案

抱歉...如果没有其中之一,确实无法完成您想做的事情:

  • 创建中间字符串,或
  • 创建一些其他中间对象来代替 String,然后将其解析为 int

Java 与 C++ 不同; a String isn't the same as a char[] .

正如我之前提到的,对 String 执行的任何返回 String 的操作都会生成一个 String 实例,因此不可避免地,您将以中间方式处理 String

这里的主要问题是,如果您实际上知道子字符串边界,那么使用它们来完成您需要的事情。

Do not worry about optimization直到您可以推断出这部分代码是最大的瓶颈。即便如此,也要坚持有意义的优化;您可以将整个 String 转换为 IntStream 并仅解析 Java 8 中实际数字的元素。

这段代码很可能不会对性能造成重大影响,过早地优化它会导致您走上一条非常、非常痛苦的道路。

实际上,您可以得到的最接近的结果(使用 Java 8 的 Stream API)是在 CharacterString 之间进行一些转换,但这仍然会创建中间 Strings:

System.out.println(Integer.parseInt("abcd12345abcd".chars()
                                                   .filter(Character::isDigit)
                                                   .mapToObj(c -> (char) c)
                                                   .map(Object::toString)
                                                   .reduce("", String::concat)));

...阅读和理解起来比这难看:

System.out.println(Integer.parseInt("abcd12345abcd".substring(4, 9)));

关于java - 在java中将SUBSTRING转换为整数的最佳方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31440747/

相关文章:

postgresql - 在 PostgresQL 中将整数转换为时间戳

string - 插入错误类型时sqlite自动将字符串转换为整数

Java - 避免创建空的子类和接口(interface)或生成 Java 源代码模板

java - 依赖更新后gradle构建失败

java - 从 : android. os.Handler.handleCallback(Handler.java:733) 调用未实现的 WebView 方法运行

java - 用于范围分页的数据库缓存策略

Java 字符串连接

c++ - 在 C++ 中将字符 vector 转换为字符串 vector

C++ 字符串操作,字符串下标超出范围

types - 在 F# 中定义非零整数类型