我想将许多UTF-8内容插入到URL中以进行SEO。例如,发布要包含在URI(site.com/tags/id/TAG-NAME
)中的标签。但是,标准仅允许ASCII字符。
Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde.
solution seems为:
使用UTF-8的字节序列
编码
不是%HH的ASCII字母或数字,
其中HH是的十六进制值
字节
但是,这会转换清晰的单词(和SEO有值(value)的单词)into mumbo-jumbo。因此,我想知道Google是否仍然足够聪明,可以处理包含编码数据的URL中的搜索-还是我应该尝试将那些非英语字符转换为半ASCII对应字符(这可能对基于拉丁语的语言有所帮助)?
最佳答案
首先,搜索引擎确实不在乎URL。它们可以帮助访问者:访问者链接到站点,搜索引擎对此表示关注。 URL很容易成为垃圾邮件,如果他们关心的话,就会诱使他们成为垃圾邮件。没有主要的搜索引擎希望如此。 allinurl:
只是google的一项功能,可帮助高级用户,而不是自然排名中的重要因素。使用较自然的URL所带来的任何好处可能都是次等搜索引擎为您的网站编制索引所带来的PR的附带好处-而且有证据表明,随着PR的出现,这也可能是负面的。
从Google Webmaster Central
Does that mean I should avoid rewriting dynamic URLs at all?
That's our recommendation, unless your rewrites are limited to removing unnecessary parameters, or you are very diligent in removing all parameters that could cause problems. If you transform your dynamic URL to make it look static you should be aware that we might not be able to interpret the information correctly in all cases. If you want to serve a static equivalent of your site, you might want to consider transforming the underlying content by serving a replacement which is truly static. One example would be to generate files for all the paths and make them accessible somewhere on your site. However, if you're using URL rewriting (rather than making a copy of the content) to produce static-looking URLs from a dynamic site, you could be doing harm rather than good. Feel free to serve us your standard dynamic URL and we will automatically find the parameters which are unnecessary.
我个人认为,只要获得更多点击并帮助用户,就没有什么大不了的了。就Unicode而言,您还不了解它是如何工作的:请求转到了十六进制编码的unicode目标,但是如果渲染引擎希望将它们解码回具有视觉吸引力的东西,则必须知道如何处理。 Google will render (aka decode) unicode (encoded) URL's properly。
由于phishing attacks using ideographs that look the same,某些浏览器通过始终对主机名部分进行编码,使其稍微复杂一些。
我想给你看一个例子,这是wget发出的对http://hy.wikipedia.org/wiki/Գլխավոր_Էջ的请求:
Hypertext Transfer Protocol
GET /wiki/%D4%B3%D5%AC%D5%AD%D5%A1%D5%BE%D5%B8%D6%80_%D4%B7%D5%BB HTTP/1.0\r\n
[Expert Info (Chat/Sequence): GET /wiki/%D4%B3%D5%AC%D5%AD%D5%A1%D5%BE%D5%B8%D6%80_%D4%B7%D5%BB HTTP/1.0\r\n]
[Message: GET /wiki/%D4%B3%D5%AC%D5%AD%D5%A1%D5%BE%D5%B8%D6%80_%D4%B7%D5%BB HTTP/1.0\r\n]
[Severity level: Chat]
[Group: Sequence]
Request Method: GET
Request URI: /wiki/%D4%B3%D5%AC%D5%AD%D5%A1%D5%BE%D5%B8%D6%80_%D4%B7%D5%BB
Request Version: HTTP/1.0
User-Agent: Wget/1.11.4\r\n
Accept: */*\r\n
Host: hy.wikipedia.org\r\n
Connection: Keep-Alive\r\n
\r\n
如您所见,与其他浏览器一样,wget只会为您的目标进行url编码,然后继续将请求发送到url编码的目标。 URL解码的域仅出于视觉上的方便而存在。
关于url - 允许SEO网址中使用非英语(ASCII)字符吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2015546/