我只想允许目录 /minsc
中的一个文件,但我想禁止该目录的其余部分。
现在 robots.txt 中是这样的:
User-agent: *
Crawl-delay: 10
# Directories
Disallow: /minsc/
我想要允许的文件是/minsc/menu-leaf.png
我害怕造成损害,所以我不知道是否必须使用:
A)
User-agent: *
Crawl-delay: 10
# Directories
Disallow: /minsc/
Allow: /minsc/menu-leaf.png
或
B)
User-agent: *
Crawl-delay: 10
# Directories
Disallow: /minsc/* //added "*" -------------------------------
Allow: /minsc/menu-leaf.png
?
感谢并抱歉我的英语。
最佳答案
To exclude all files except one
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:
User-agent: *
Disallow: /~joe/stuff/
或者,您可以明确禁止所有不允许的页面:
block 引用>User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html
根据Wikipedia ,如果您要使用“Allow”指令,则应将其放在“Disallow”之前以获得最大兼容性:
Allow: /directory1/myfile.html Disallow: /directory1/
此外,您应该将爬行延迟放在最后,根据 Yandex :
To maintain compatibility with robots that may deviate from the standard when processing robots.txt, the Crawl-delay directive needs to be added to the group that starts with the User-Agent record right after the Disallow and Allow directives).
所以,最后,您的 robots.txt 文件应如下所示:
User-agent: * Allow: /minsc/menu-leaf.png Disallow: /minsc/ Crawl-delay: 10
关于robots.txt - robots.txt 中只允许目录中的一个文件吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34914054/