php - 权威的 PHP url 解析器

在你告诉我使用 parse_url 之前，它还不够好并且有太多错误。在这里可以找到许多关于解析 URL 主题的问题，但几乎所有问题都是只解析某些特定类别的 URL 或不完整。

我正在寻找一个明确的符合 RFC 的 PHP URL 解析器，它可以可靠地处理浏览器可能遇到的任何 URL。我在其中包括:

页面内部链接 #, #title
页面相关 URL blah/thing.php
站点相关 URLs /blah/thing.php
匿名协议(protocol) URLs //ajax.googleapis.com/ajax/libs/jquery/1.8.1/jquery.min.js
Callto URLs callto:+442079460123
文件 URL file:///Users/me/thisfile.txt
Mailto URLs mailto:user@example.com?subject=hello, mailto:?subject=hello

并支持所有常用的方案/身份验证/域/路径/查询/片段等，并将所有这些元素分解为一个数组，并为相对/无架构 URL 提供额外的标志。理想情况下，它会附带一个支持相同元素的 URL 重构器(如 http_build_url)，我也希望应用验证(即，如果它无效，它应该能够对 URL 做出最佳猜测解释，但将其标记因此，就像浏览器一样)。

This answer包含对这种野兽的诱人的费马式引用，但它实际上并没有去任何地方。

我查看了所有主要框架，但它们似乎只提供了 parse_url 的薄包装，这通常不是一个好的起点，因为它会犯很多错误。

那么，这样的事情存在吗？

最佳答案

不确定 parse_url() 有多少错误，但这可能会有所帮助:

As the "first-match-wins" algorithm is identical to the "greedy" disambiguation method used by POSIX regular expressions, it is natural and commonplace to use a regular expression for parsing the potential five components of a URI reference.

The following line is the regular expression for breaking-down a well-formed URI reference into its components.

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
 12            3  4          5       6  7        8 9

来源:https://www.rfc-editor.org/rfc/rfc3986#page-51

它将位置分解为:

$2 - scheme
$4 - host
$5 - path
$6 - query string
$8 - fragment

要重建，您可以使用:

$1 . $3 . $5 . $6 . $8

关于php - 权威的 PHP url 解析器，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12687790/

php - 权威的 PHP url 解析器

上一篇：php - 在 PHP 中评估类似 MongoDB 的 JSON 查询

下一篇：php - 依赖脚本需要 Composer 自动加载器的路径