robots.txt - robots.txt 中只允许目录中的一个文件吗？

我只想允许目录 /minsc 中的一个文件，但我想禁止该目录的其余部分。

现在 robots.txt 中是这样的:

User-agent: *
Crawl-delay: 10
# Directories
Disallow: /minsc/

我想要允许的文件是/minsc/menu-leaf.png

我害怕造成损害，所以我不知道是否必须使用:

User-agent: *
Crawl-delay: 10
# Directories
Disallow: /minsc/
Allow: /minsc/menu-leaf.png

或

User-agent: *
Crawl-delay: 10
# Directories
Disallow: /minsc/*    //added "*" -------------------------------
Allow: /minsc/menu-leaf.png

？

感谢并抱歉我的英语。

最佳答案

根据the robots.txt website :

To exclude all files except one

This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:

User-agent: *

Disallow: /~joe/stuff/

或者，您可以明确禁止所有不允许的页面:

User-agent: *

Disallow: /~joe/junk.html

Disallow: /~joe/foo.html

Disallow: /~joe/bar.html

根据Wikipedia ，如果您要使用“Allow”指令，则应将其放在“Disallow”之前以获得最大兼容性:
Allow: /directory1/myfile.html
Disallow: /directory1/
此外，您应该将爬行延迟放在最后，根据 Yandex :

To maintain compatibility with robots that may deviate from the standard when processing robots.txt, the Crawl-delay directive needs to be added to the group that starts with the User-Agent record right after the Disallow and Allow directives).

所以，最后，您的 robots.txt 文件应如下所示:
User-agent: *
Allow: /minsc/menu-leaf.png
Disallow: /minsc/
Crawl-delay: 10

关于robots.txt - robots.txt 中只允许目录中的一个文件吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34914054/

robots.txt - robots.txt 中只允许目录中的一个文件吗？

上一篇：linux - 为什么 Bash `(())` 在 `[[]]` 内不起作用？

下一篇：kubernetes - 如何调试为什么我的单作业 pod 以 status = "Error"结尾？