我正在使用 Python+MySQLdb 设计一个 crontab 作业,从 MySQL 中提取数据,生成 XML 文件并将其压缩。是的,这是每天中午发生的归档任务。
我的代码:
#!/usr/bin/env python
#encoding: utf-8
from dmconfig import DmConf
#from dmdb import Dmdb
import redis
import MySQLdb
import dawnutils
import time
from datetime import datetime, timedelta, date
conf = DmConf().loadConf()
db = MySQLdb.connect(host=conf["DbHost"],user=conf['DbAccount'],passwd=conf['DbPassword'],\
db=conf['DbName'],charset=conf['DbCharset'])
cache = redis.Redis(host=conf['RedisHost'], port=conf['RedisPort'],
db=conf['Redisdbid'], password=conf['RedisPassword'])
#cursor = db.cursor()
def try_reconnect(conn):
try:
conn.ping()
except:
conn = MySQLdb.connect(host=conf["DbHost"],user=conf['DbAccount'],passwd=conf['DbPassword'],\
db=conf['DbName'],charset=conf['DbCharset'])
def zip_task(device, start, stop):
#cursor = db.cursor()
format = "%Y%m%d%H%M%S"
begin = time.strftime("%Y-%m-%d %H:%M:%S",time.strptime(start,format))
end = time.strftime("%Y-%m-%d %H:%M:%S",time.strptime(stop,format))
print "%s (%s,%s)"%(device, begin, end)
sql = "SELECT * from `period` WHERE `snrCode` = \"%s\" AND `time` > \"%s\" AND `time` < \"%s\" ORDER BY `recId` DESC"%(device, begin, end)
print sql
cursor = db.cursor()
try_reconnect(db)
t1 = time.time()
try:
cursor.execute(sql)
results = cursor.fetchall()
except MySQLdb.Error,e:
print "Error %s"%(e)
print ("SQL takes %f seconds"%(time.time()-t1))
print ("len of reconds, %d"%len(results))
#for row in results:
#print row
def dispatcher(devSet, start, stop):
print "size of set: %d"%len(devSet)
print devSet
for dev in devSet:
zip_task(dev, start, stop)
def archive_task_queue():
today = datetime.now()
oneday = timedelta(days=1)
yesterday = today - oneday
format = "%Y%m%d%H%M%S"
begin = time.strftime(format, yesterday.timetuple())[:8] + '120000'
end = time.strftime(format, today.timetuple())[:8] + '120000'
sql = "SELECT * from `logbook` WHERE `login` > \"%s\" AND `login` < \"%s\" AND `logout` > \"%s\" AND `logout` < \"%s\""%(begin, end, begin, end)
print sql
cursor = db.cursor()
reclist = []
try:
cursor.execute(sql)
results = cursor.fetchall()
for row in results:
#print row
reclist.append(row[1])
except MySQLdb.Error,e:
print "Error %s"%(e)
#reclist = [u'A2H300001']
if len(reclist):
dispatcher(set(reclist), begin, end)
db.close()
if __name__ == '__main__':
archive_task_queue()
在我的代码中,我将查询设备事件日志,并获取当天设置的事件设备。并一一查询每个设备的数据集。这些问题是随着第二阶段查询而来的。运行后查看我的控制台:
SELECT * from `logbook` WHERE `login` > "20160720120000" AND `login` < "20160721 120000" AND `logout` > "20160720120000" AND `logout` < "20160721120000"
size of set: 4
set([u'B1H700001', u'B1H700002', u'A1E500018', u'A2H300001'])
B1H700001 (2016-07-20 12:00:00,2016-07-21 12:00:00)
SELECT * from `period` WHERE `snrCode` = "B1H700001" AND `time` > "2016-07-20 12 :00:00" AND `time` < "2016-07-21 12:00:00" ORDER BY `recId` DESC
SQL takes 0.018232 seconds
len of reconds, 597
B1H700002 (2016-07-20 12:00:00,2016-07-21 12:00:00)
SELECT * from `period` WHERE `snrCode` = "B1H700002" AND `time` > "2016-07-20 12 :00:00" AND `time` < "2016-07-21 12:00:00" ORDER BY `recId` DESC
SQL takes 0.974020 seconds
len of reconds, 4642
A1E500018 (2016-07-20 12:00:00,2016-07-21 12:00:00)
SELECT * from `period` WHERE `snrCode` = "A1E500018" AND `time` > "2016-07-20 12 :00:00" AND `time` < "2016-07-21 12:00:00" ORDER BY `recId` DESC
SQL takes 0.342373 seconds
len of reconds, 0
A2H300001 (2016-07-20 12:00:00,2016-07-21 12:00:00)
SELECT * from `period` WHERE `snrCode` = "A2H300001" AND `time` > "2016-07-20 12 :00:00" AND `time` < "2016-07-21 12:00:00" ORDER BY `recId` DESC
SQL takes 68.173677 seconds
len of reconds, 5794
查询时间很奇怪。 B1H700002 4642个数据点需要0.9秒,A2H300001 5764个数据点需要68秒。
然后我将问题范围缩小到仅查询特定的设备 ID,您可以在我之前的代码中找到它。结果是一样的。该查询需要 65 秒。
有什么线索吗?
最佳答案
我对此 SQL 查询做了更多实验。最后发现和MySQLdb的内存使用有关。虽然总数据集可能只有 5794 行,但如果我添加 LIMIT 5000,查询只需要 0.3 秒,否则需要 60 秒以上。
因此,作为一种解决方法,我使用 LIMIT 和一些分页方法来查询每个查询的有限行并将其附加到以前的查询。总时间减少到1秒以内。
关于Python MySQLdb 响应时间在相似的集合上截然不同,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38492932/