php - 如何使用 PHP 检测爬虫/蜘蛛?

标签 php user-agent

如何使用 PHP 检测爬虫/蜘蛛?

我目前正在做一个项目,我需要跟踪每个爬虫的访问。
我知道你应该使用 HTTP_USER_AGENT 但我不太确定如何为此目的格式化代码而且我知道可以很容易地更改 USER AGENT 所以我也想知道是否可以添加一些更多的参数来避免欺骗?

我正在尝试做的示例代码..

<?php
$user_agent = $_SERVER['HTTP_USER_AGENT'];
if (strpos( $user_agent, 'Google') !== false)
{
echo "Googlebot is here";
}
?>

谢谢

最佳答案

根据 Verifying Googlebot :

You can verify that a bot accessing your server really is Googlebot (or another Google user-agent) by using a reverse DNS lookup, verifying that the name is in the googlebot.com domain, and then doing a forward DNS lookup using that googlebot name. This is useful if you're concerned that spammers or other troublemakers are accessing your site while claiming to be Googlebot.

For example:

host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer
crawl-66-249-66-1.googlebot.com.

host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1
Google doesn't post a public list of IP addresses for webmasters to whitelist. This is because these IP address ranges can change, causing problems for any webmasters who have hard coded them. The best way to identify accesses by Googlebot is to use the user-agent (Googlebot).

您可以进行反向 DNS 查找:

function validateGoogleBotIP($ip) {
    $hostname = gethostbyaddr($ip); //"crawl-66-249-66-1.googlebot.com"

    return preg_match('/\.google(bot)?\.com$/i', $hostname);
}

if (strpos($_SERVER['HTTP_USER_AGENT'], 'Google') !== false) {
    if (validateGoogleBotIP($_SERVER['REMOTE_ADDR'])) {
        echo 'It is ACTUALLY google';
    } else {
        echo 'Someone\'s faking it!';
    }
} else {
    echo 'Nothing to do with Google';
}

关于php - 如何使用 PHP 检测爬虫/蜘蛛?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19980363/

相关文章:

php - 连接 3 个表时无法查看结果

php - 错误 : Class 'Facebook\FacebookSession' not found with the facebook PHP SDK

php - 如果值为真,则应选中复选框

jquery - 检测 Chrome iPhone 6+

objective-c - 在 UIWebView (IOS 5) 中更改用户代理

php - 从 Drupal 形式的 check_plain() 中排除字符

php - 检查另一个数组内的数组是否具有相同的元素

mobile - 从经典 ASP 检测移动用户代理并在 session 开始时重定向

javascript - 响应式网站部分基于设备而不是屏幕分辨率

java - 如何将用户代理字符串与 API 结合使用?