python - 如何查找某个特定 ip 被 ping 到该 url 的次数？

我有一个 python 脚本，它从日志文件中提取唯一的 IP 地址，并显示这些 IP 被 ping 的次数，代码如下。

import sys

def extract_ip(line):
    return line.split()[0]

def increase_count(ip_dict, ip_addr):
    if ip_addr in ip_dict:
       ip_dict[ip_addr] += 1
    else:
       ip_dict[ip_addr] = 1

def read_ips(infilename):
    res_dict = {}
    log_file = file(infilename)
    for line in log_file:
        if line.isspace():
           continue
        ip_addr = extract_ip(line)
        increase_count(res_dict, ip_addr)
    return res_dict

def write_ips(outfilename, ip_dict):
    out_file = file(outfilename, "w")
    for ip_addr, count in ip_dict.iteritems():
        out_file.write("%5d\t%s\n" % (count, ip_addr))
    out_file.close()

def parse_cmd_line_args():
    if len(sys.argv)!=3:
       print("Usage: %s [infilename] [outfilename]" % sys.argv[0])
       sys.exit(1)
    return sys.argv[1], sys.argv[2]

def main():
    infilename, outfilename = parse_cmd_line_args()
    ip_dict = read_ips(infilename)
    write_ips(outfilename, ip_dict)

if __name__ == "__main__":
    main()

日志文件采用以下格式，共 2L 行。这些是日志文件的前 30 行

220.227.40.118 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
220.227.40.118 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - -
59.95.13.217 - - [06/Mar/2012:00:00:00 -0800] "GET /dbupdates2.xml HTTP/1.1" 404 0 - -
111.92.9.222 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
120.56.236.46 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - -
49.138.106.21 - - [06/Mar/2012:00:00:00 -0800] "GET /add.txt HTTP/1.1" 204 214 - -
117.195.185.130 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
122.160.166.220 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /welcome.html HTTP/1.1" 204 212 - -
117.18.231.5 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.18.231.5 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
122.169.136.211 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
203.217.145.10 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.18.231.5 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - -
59.95.13.217 - - [06/Mar/2012:00:00:00 -0800] "GET /dbupdates2.xml HTTP/1.1" 404 0 - -
203.217.145.10 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.206.70.4 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /css/epic.css HTTP/1.1" 204 214 "http://www.epicbrowser.com/welcome.html" -
117.206.70.4 - - [06/Mar/2012:00:00:00 -0800] "GET /add.txt HTTP/1.1" 204 214 - -
117.206.70.4 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - -
118.97.38.130 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /js/flash_detect_min.js HTTP/1.1" 304 0 "http://www.epicbrowser.com/welcome.html" -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/home-page-bottom.jpg HTTP/1.1" 304 0 "http://www.epicbrowser.com/welcome.html" -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/Facebook_Like.png HTTP/1.1" 204 214 "http://www.epicbrowser.com/welcome.html" -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/Twitter_Follow.png HTTP/1.1" 204 214 "http://www.epicbrowser.com/welcome.html" -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/home-page-top.jpg HTTP/1.1" 304 0 "http://www.epicbrowser.com/welcome.html" -
49.138.106.21 - - [06/Mar/2012:00:00:01 -0800] "GET /dbupdates2.xml HTTP/1.1" 404 0 - -
117.18.231.5 - - [06/Mar/2012:00:00:01 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.18.231.5 - - [06/Mar/2012:00:00:01 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - -
120.61.182.186 - - [06/Mar/2012:00:00:01 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -

文件的输出采用以下格式

    Number of Times      IPS
     158            111.92.9.222
     11             58.97.187.231
     30             212.57.209.41
     5              119.235.51.66
     3              122.168.134.106
     5              180.234.220.75
     13             115.252.223.243

这里 ip 111.92.9.222 - - [06/Mar/2012:00:00:00 -0800] "GET/mysidebars/newtab.html HTTP/1.1"404 0 - - 总共 ping 到史诗 158 次。

现在我想向代码添加一个功能，这样如果我传递特定的 URL，它应该返回哪些 IP 地址(来自日志文件或输出文件的 IP 地址)访问该 URL 的次数。

例如如果我将网址作为输入传递: http://www.epicbrowser.com/hrefadd.xml

输出应采用以下格式

     10.10.128.134        4
     10.134.222.232       6

最佳答案

我假设您只需要一个给定 URL 的 IP 的要求是正确的。在这种情况下，您只需向程序添加一个额外的过滤器即可过滤掉不需要的行。程序的结构可以不变。

由于日志文件不知道有关主机的任何信息，因此您只需指定 URL 的路径部分作为第三个参数；示例:“/hrefadd.xml”

#!/usr/bin/env python
# 
# Counts the IP addresses of a log file.
# 
# Assumption: the IP address is logged in the first column.
# Example line: 117.195.185.130 - - [06/Mar/2012:00:00:00 -0800] \
#    "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - -
#

import sys

def urlcheck(line, url):
    '''Checks if the url is part of the log line.'''
    lsplit = line.split()
    if len(lsplit)<7:
        return False
    return url==lsplit[6]

def extract_ip(line):
    '''Extracts the IP address from the line.
       Currently it is assumed, that the IP address is logged in
       the first column and the columns are space separated.'''
    return line.split()[0]

def increase_count(ip_dict, ip_addr):
    '''Increases the count of the IP address.
       If an IP address is not in the given dictionary,
       it is initially created and the count is set to 1.'''
    if ip_addr in ip_dict:
        ip_dict[ip_addr] += 1
    else:
        ip_dict[ip_addr] = 1

def read_ips(infilename, url):
    '''Read the IP addresses from the file and store (count)
       them in a dictionary - returns the dictionary.'''
    res_dict = {}
    log_file = file(infilename)
    for line in log_file:
        if line.isspace():
            continue
        if not urlcheck(line, url):
            continue
        ip_addr = extract_ip(line)
        increase_count(res_dict, ip_addr)
    return res_dict

def write_ips(outfilename, ip_dict):
    '''Write out the count and the IP addresses.'''
    out_file = file(outfilename, "w")
    for ip_addr, count in ip_dict.iteritems():
        out_file.write("%s\t%5d\n" % (ip_addr, count))
    out_file.close()

def parse_cmd_line_args():
    '''Return the in and out file name.
       If there are more or less than two parameters,
       an error is logged in the program is exited.'''
    if len(sys.argv)!=4:
        print("Usage: %s [infilename] [outfilename] [url]" % sys.argv[0])
        sys.exit(1)
    return sys.argv[1], sys.argv[2], sys.argv[3]

def main():
    infilename, outfilename, url = parse_cmd_line_args()
    ip_dict = read_ips(infilename, url)
    write_ips(outfilename, ip_dict)

if __name__ == "__main__":
    main()

恕我直言，如果原始的 post 也会有帮助。被引用。

恕我直言，您应该保留评论。

关于python - 如何查找某个特定 ip 被 ping 到该 url 的次数？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9678609/

python - 如何查找某个特定 ip 被 ping 到该 url 的次数？

上一篇：python - 给wx.StaticBitmap一个透明背景？ wxpython

下一篇：python - 我自己的 RPC 服务器 API key