regex - 获取 URL 的一部分(正则表达式)

给定 URL(单行):
http://test.example.com/dir/subdir/file.html

如何使用正则表达式提取以下部分:

子域(测试)
域名 (example.com)
不含文件的路径 (/dir/subdir/)
文件 (file.html)
文件的路径 (/dir/subdir/file.html)
不带路径的网址 ( http://test.example.com )
(添加您认为有用的任何其他内容)

即使我输入以下 URL，正则表达式也应该正常工作:

http://example.example.com/example/example/example.html

最佳答案

A single regex to parse and breakup a full URL including query parameters and anchors e.g.

https://www.google.com/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash

^((http[s]?|ftp):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$

RexEx positions:

url: RegExp['$&'],

protocol:RegExp.$2,

host:RegExp.$3,

path:RegExp.$4,

file:RegExp.$6,

query:RegExp.$7,

hash:RegExp.$8

然后您可以非常轻松地进一步解析主机(以“.”分隔)。

我会做的是使用这样的东西:

/*
    ^(.*:)//([A-Za-z0-9\-\.]+)(:[0-9]+)?(.*)$
*/
proto $1
host $2
port $3
the-rest $4

进一步解析“其余部分”，使其尽可能具体。在一个正则表达式中执行此操作有点疯狂。

关于regex - 获取 URL 的一部分(正则表达式)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27745/

上一篇：sql - SQL 字符串中的转义 & 字符

下一篇：string - 从字符串中解析可用的街道地址、城市、州、邮政编码

Java正则表达式拆分保持收缩

language-agnostic - 只输出的函数应该返回任何东西吗？

algorithm - 如何根据反馈减少 Mastermind 中的可能性？

algorithm - 将 0's & 1' 排列成数组

rest - 微服务中 REST 的 URL 模式

php - 如何在 PHP 中通过 URL 传递多个变量？

regex - Drupal URL重写冲突

php - preg_match() 和 preg_replace() 慢吗？

php - 生成私有(private)、唯一、安全的 URL