用于匹配任何 URL 字符的正则表达式

我遇到了一个规范，它说将一个字段描述为:

Any URL char

我想通过 REGEX 在我这边验证它。

我搜索了一下，即使我找到了这个很棒的 SO question包含我需要的每一条信息，我发现没有准确询问正则表达式的问题太糟糕了，所以我在这里。

匹配任何 URL 字符的正确正则表达式是什么？

编辑

我从我从规范中理解的内容中提取了以下正则表达式:

[\w\-.~:/?#\[\]@!$&'()*+,;=%]

那么，这个 REGEX 是正确和详尽的还是我错过了什么？

看完specification ，我猜它只是“所有 ASCII 字符”。

最佳答案

见 Characters section :

A URI is composed from a limited set of characters consisting of digits, letters, and a few graphic symbols. A reserved subset of those characters may be used to delimit syntax components within a URI while the remaining characters, including both the unreserved set and those reserved characters not acting as delimiters, define each component's identifying data.

尽管有迹象表明仅支持数字、字母和某些符号，但您可能会在 Appendix B. Parsing a URI Reference with a Regular Expression 看到建议的正则表达式来解析 URI。这实际上可能匹配几乎每个字符:

The following line is the regular expression for breaking-down a well-formed URI reference into its components.
 ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
   12            3  4          5       6  7        8 9

您收集的内容 [\w.~:/?#\[\]@!$&'()*+,;=%-]模式过于严格，除非 \w是 Unicode 感知的(URI 可能包含任何 Unicode 字母)，那么它可能或多或少地为您工作。

如果您打算只匹配 ASCII URL，请使用 ^[\x00-\x7F]+$ (任何 1+ 个 ASCII 符号)或 ^[!-~]+$ (仅可见 ASCII)。

关于用于匹配任何 URL 字符的正则表达式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43588699/

用于匹配任何 URL 字符的正则表达式

上一篇：css - 屏幕底部的页脚，上下文应在其下方滚动

下一篇：sql - 如何编写 Microsoft Access 查询以百分比形式输出字段中元素的频率？