我正在使用来自 http://www.dabeaz.com/generators/fieldmap.py 的字段映射生成器函数
#!/usr/bin/env python
def field_map(dictseq, name, func):
for d in dictseq:
d[name] = func(d[name])
yield d
if __name__ == '__main__':
loglines = open("test.log")
import re
logpats = r'(\S+) (\S+) (\S+) (\S+) (\S+) \[(.*?)\] \"(.*?)\" (\S+) (\S+) \"(.*?)\" \"(.*?)\" (\S+) \"(.*?)\" \"(.*?)\" (\S+)'
logpat = re.compile(logpats)
groups = (logpat.match(line) for line in loglines)
tuples = (g.groups() for g in groups if g)
#for t in tuples:
# print t
colnames = ('record_id', 'elapsed_time', 'client', 'username' , 'client_id','date',
'http_method_url', 'status', 'size', 'http_referer','useragent', 'mime',
'filter_name_reason', 'profiles', 'ipport')
log = (dict(zip(colnames,t)) for t in tuples)
log = field_map(log,"status",int)
log = field_map(log,"size",lambda s: int(s) if s != '-' else 0)
for x in log:
print x
它给出了这个错误,有什么想法吗?
[root@cumbria extended]# python fieldmap.py
Traceback (most recent call last):
File "fieldmap.py", line 24, in <module>
for x in log:
File "fieldmap.py", line 4, in field_map
for d in dictseq:
File "fieldmap.py", line 5, in field_map
d[name] = func(d[name])
ValueError: invalid literal for int() with base 10: 'status'
test.log有这种格式的数据
"1356313509.519-6-10.66.54.21-8080" 2089 10.112.151.213 "anonymous@10.112.151.213" "6" [24/Dec/2012:01:45:11] "GET http://apps.facebook.com:80/thesimssocial/?fb_source=bookmark_apps&ref=bookmarks&count=2&fb_bmpos=4_2" 200 58300 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 BMID/E679E9E153" text/html "- -" "M&B-112,HTTP,QUERIES,uncachable,antivirus,REDIRECT_THIS" "10.66.54.21:8080"
最佳答案
test.log 中的第一行可能是一个标题,其中包含字段名称而不是它们的值。这就是为什么您会看到“状态”而不是“200”之类的原因。
您可以让您的正则表达式更具选择性,以便更快地过滤掉不合适的行,例如,使用 \d+
来匹配 http 状态。
关于python 生成器 int() 的无效文字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14025653/