我编写了一个允许将 URL 发布到 YouTube 上的 C++ 程序。它的工作原理是将 URL 作为您将其键入程序或直接输入的输入,然后它将替换每个“/”、“.”。在带有“*”的字符串中。然后将修改后的字符串放在剪贴板上(这仅适用于 Windows 用户)。
当然,在我什至可以调用该程序可用之前,它必须返回:我需要知道何时在 URL 中使用“.”、“/”。我看过这篇文章:http://en.wikipedia.org/wiki/Uniform_Resource_Locator , 并知道 '.'在处理“master website”时使用(在这个URL的情况下,“en.wikipedia.org”),然后'/'之后使用,但我去过其他网站,http://msdn.microsoft.com/en-us/library/windows/desktop/ms649048%28v=vs.85%29.aspx ,但事实并非如此(它甚至分别用“%28”、“%29”替换了“(”、“)”!)
我似乎还请求了一个 .aspx 文件,不管它是什么。此外,还有一个“.”在该 URL 的括号内。我什至尝试查看有关 URL 的正则表达式(我还没有完全理解这些......)。有人可以告诉我(或将我链接到)有关在 URL 中使用“.”、“/”的规则吗?
最佳答案
你能解释一下你为什么要做这个令人费解的事情吗?你想达到什么目的?一旦您回答了这个问题,您可能不需要像您想的那样知道那么多。
同时这里有一些信息。一个 URL 实际上是由许多部分组成的
http: - the "scheme" or protocol used to access the resource. "HTTP", "HTTPS",
"FTP", etc are all examples of a scheme. There are many others
// - separates the protocol from the host (server) address
myserver.org - the host. The host name is looked up against a DNS (Dynamic Name Server)
service and resolved to an IP address - the "phone number" of the machine
which can serve up the resource (like "98.139.183.24" for www.yahoo.com)
www.myserver.org - the host with a prefix. Sometimes the same domain (`myserver.org`)
connects multiple servers (or ports) and you can be sent straight to the
right server with the prefix (mail., www., ftp., ... up to the
administrators of the domain). Conventionally, a server that serves content
intended for viewing with a browser has a `www.` prefix, but there's no rule
that says this must be the case.
:8080/ - sometimes, you see a colon followed by up to five digits after the domain.
this indicates the PORT on the server where you are accessing data
some servers allow certain specific services on just a particular port
they might have a "public access" website on port 80, and another one on 8080
the https:// protocol defaults to port 443, there are ports for telnet, ftp,
etc. Add these things only if you REALLY know what you are doing.
/the/pa.th/ this is the path relative to DOCUMENTROOT on the server where the
resource is located. `.` characters are legal here, just as they are in
directory structures.
file.html
file.php
file.asp
etc - usually the resource being fetched is a file. The file may have
any of a great number of extensions; some of these indicate to the server that
instead of sending the file straight to the requester,
it has to execute a program or other instructions in this file,
and send the result of that
Examples of extensions that indicate "active" pages include
(this is not nearly exhaustive - just "for instance"):
.php = contains a php program
.py = contains a python program
.js = contains a javascript program
(usually called from inside an .htm or .html)
.asp = "active server page" associated with a
Microsoft Internet Information Server
?something=value&somethingElse=%23othervalue%23 传递给服务器的参数可以显示在 URL 中。 这可用于传递参数、表单中的条目等。 可以在此处传递任何字符 - 包括“.”、“&”、“/”... 但是你不能只在你的字符串中写这些字符......
现在是有趣的部分。
URL 不能包含某些字符(实际上有很多)。为了解决这个问题,存在一种称为“转义”字符的机制。通常这意味着用十六进制等效字符替换字符,前缀为 %
符号。因此,您经常会看到表示为 %20 的空格字符。你可以找到一个方便的列表here
有许多函数可用于将 URL 中的“非法”字符自动转换为“合法”值。
要准确了解什么是允许的,什么是不允许的,您确实需要回到原始规范。参见示例
http://www.ietf.org/rfc/rfc1738.txt
http://www.ietf.org/rfc/rfc2396.txt
http://www.ietf.org/rfc/rfc3986.txt
我按时间顺序列出它们 - 最后一个是最近的。
但我重复我的问题 -- 你到底想在这里做什么,为什么?
关于c++ - 网址通用格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15993888/