user-agent - robots.txt——用户代理 block 之间需要空行，还是可选的？

权威文档来源中给出的看似矛盾的描述。

('记录'指的是每个用户代理 block )

"The file consists of one or more records separated by one or more blank lines (terminated by CR,CR/NL, or NL). Each record contains lines of the form ...".

谷歌的 Robot.txt Specifications :

"... Note the optional use of white-space and empty lines to improve readability."

那么 -- 根据我们提供给我们的文档 -- 这里的空行是强制性的吗？

User-agent: *
Disallow: /this-directory/

User-agent: DotBot
Disallow: /this-directory/
Disallow: /and-this-directory/

或者，这样可以吗？

User-agent: *
Disallow: /this-directory/
User-agent: DotBot
Disallow: /this-directory/
Disallow: /and-this-directory/

最佳答案

Google Robots.txt Parser and Matcher Library没有对空行的特殊处理。 Python urllib.robotparser始终将空行解释为新记录的开始，尽管它们不是严格要求的，并且解析器还将 User-Agent: 识别为一个。因此，您的两种配置都适用于任一解析器。

然而，这特定于两个著名的 robots.txt 解析器；您仍然应该以最常见和明确的方式编写它，以处理编写糟糕的自定义解析器。

关于user-agent - robots.txt——用户代理 block 之间需要空行，还是可选的？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59924150/

上一篇：android - Exoplayer 视频加载速度

下一篇：github - 如何在Github上搜索所有以文件扩展名 `.feature`结尾的文件？

html - 阻止站点 Flash 播放器配置信息显示在 Google 上

ios - Mobile Safari 自动生成调用电话号码的新元素，这会影响设计

python-3.x - 如何在 Selenium Python 中更改 FireFox 的 Remote ？

javascript - 如何通过javascript获取浏览器的远程ip？

node.js - 为什么每次 Express GET 都会请求 robots.txt？

seo - 使用 robots.txt 阻止来自搜索引擎的 100 多个 url

asp.net - 当强制使用 SSL 时，如何排除某些文件夹被 ASP.net 中的搜索引擎索引？

google-chrome - 如何永久更改 Google Chrome 中的 navigator.useragent？

android - 查找适用于 Android 和/或 iOS 的 native Youtube 应用程序的用户代理