java - 是否有一个通用的 Java 库可以处理一组字符串的 URL 编码/解码?

标签 java urlencode urldecode

我经常需要对大型集合或字符串数​​组进行 url 编码或解码。除了遍历它们并使用静态 URLDecoder.decode(string, "UTF-8") 之外,是否有任何库可以提高此类操作的性能?

一位同事坚持认为使用静态方法就地解码字符串不是线程安全的。为什么会这样?

最佳答案

JDK URLDecoder 的实现效率不高。最值得注意的是,它在内部依赖于 StringBuffer(在 URLDecoder 的情况下不必要地引入了同步)。 Apache commons 提供 URLCodec ,但据报道它在性能方面也存在类似问题,但我尚未证实在最新版本中情况仍然如此。

Mark A. Ziesemer不久前写了一篇关于 URLDecoder 的问题和性能的帖子。他记录了一些错误报告并最终编写了一个完整的替代品。因为这是 SO,所以我将在此处引用一些关键摘录,但您真的应该在此处阅读整篇源文章:http://blogger.ziesemer.com/2009/05/improving-url-coder-performance-java.html

精选引述:

Java provides a default implementation of this functionality in java.net.URLEncoder and java.net.URLDecoder. Unfortunately, it is not the best performing, due to both how the API was written as well as details within the implementation. A number of performance-related bugs have been filed on sun.com in relation to URLEncoder.

There is an alternative: org.apache.commons.codec.net.URLCodec from Apache Commons Codec. (Commons Codec also provides a useful implementation for Base64 encoding.) Unfortunately, Commons' URLCodec suffers some of the same issues as Java's URLEncoder/URLDecoder.

...

Recommendations for both the JDK and Commons:

When constructing any of the "buffer" classes, e.g. ByteArrayOutputStream, CharArrayWriter, StringBuilder, or StringBuffer, estimate and pass-in an estimated capacity. The JDK's URLEncoder currently does this for its StringBuffer, but should do this for its CharArrayWriter instance as well. Common's URLCodec should do this for its ByteArrayOutputStream instance. If the classes' default buffer sizes are too small, they may have to resize by copying into new, larger buffers - which isn't exactly a "cheap" operation. If the classes' default buffer sizes are too large, memory may be unnecessarily wasted.

Both implementations are dependent on Charsets, but only accept them as their String name. Charset provides a simple and small cache for name lookups - storing only the last 2 Charsets used. This should not be relied upon, and both should accept Charset instances for other interoperability reasons as well.

Both implementations only handle fixed-size inputs and outputs. The JDK's URLEncoder only works with String instances. Commons' URLCodec is also based on Strings, but also works with byte[] arrays. This is a design-level constraint that essentially prevents efficient processing of larger or variable-length inputs. Instead, the "stream-supporting" interfaces such as CharSequence, Appendable, and java.nio's Buffer implementations of ByteBuffer and CharBuffer should be supported.

...

Note that com.ziesemer.utils.urlCodec is over 3x as fast as the JDK URLEncoder, and over 1.5x as fast as the JDK URLDecoder. (The JDK's URLDecoder was faster than the URLEncoder, so there wasn't as much room for improvement.)

我认为您的同事认为 URLDecode 不是线程安全的是错误的。其他答案在这里有详细解释。

编辑 [2012-07-03] - 根据 OP 稍后发表的评论

不确定您是否在寻找更多创意?您是正确的,如果您打算将列表作为原子集合进行操作,那么您将必须同步对列表的所有访问,包括您的方法之外的引用。但是,如果您对返回的列表内容可能与原始列表不同感到满意,那么对集合中可能被其他线程修改的“批处理”字符串进行操作的蛮力方法可能如下所示:

/**
 * @param origList will be copied by this method so that origList can continue
 *                 to be read/write by other threads. 
 * @return list containing  decoded strings for each entry that was 
           in origList at time of copy.
 */
public List<String> decodeListOfStringSafely(List<String> origList)
        throws UnsupportedEncodingException {
    List<String> snapshotList = new ArrayList<String>(origList);
    List<String> newList  = new ArrayList<String>(); 

    for (String urlStr : snapshotList) {
      String decodedUrlStr  = URLDecoder.decode(urlStr, "UTF8");
          newList.add(decodedUrlStr);
    }

    return newList;
}

如果这没有帮助,那么我仍然不确定您在寻找什么,您最好提出一个新的、更简洁的问题。如果这就是您要问的问题,那么请小心,因为出于多种原因,这个脱离上下文的示例并不是一个好主意。

关于java - 是否有一个通用的 Java 库可以处理一组字符串的 URL 编码/解码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10402048/

相关文章:

python - 如何在 Python 3 中进行 URL 编码?

c# - 是否需要对文件名进行url编码?

java - 从 android/java 中的 URL 获取特定数据?

java - Timer 和 TimerTask - 如何从 TimerTask 运行中重新安排 Timer

java - 如何 toString() 像这样的列表 : List<Option>?

c# - 从 C# 调用 Java 代码

java - JSTL 中的 URL/URI 解码

java - 如何解决 Maven 插件中的 Maven 依赖项?

c# - 为什么我无法在 Visual Studio 2010 中找到或使用 UrlEncode?

c++ - 如何在 C++ 中用 UTF-8 字符解码 URI