python - 数据提取和转换效率

标签 python mysql

我有一个 Python 脚本,它连接到 MySQL 数据库并执行许多嵌套的 SELECT 查询。它基本上是一个巨大的 for 循环。该数据库的结构使得企业有菜单,菜单有部分,部分有项目。该脚本查询所有业务,并且对于每个业务,它查询其所有菜单,依此类推。它一路构建一个大字典,然后以 JSON 形式输出。

它看起来像这样:

#!/usr/bin/env python

from bottle import route, run
import mysql.connector
import json
import collections
import datetime

def getBusinesses():
    conn = mysql.connector.connect(user="APIUser", password="abc123", host="12.34.56.78", port="54321", database="businesses")
    cursor = conn.cursor()
    objects = {}

    businessesQuery = ("SELECT * FROM business")
    cursor.execute(businessesQuery)
    businessRows = cursor.fetchall()

    businessObjects = []
    for businessRow in businessRows:
        print businessRow[0]
        businessDict = collections.OrderedDict()
        businessDict['id'] = businessRow[0]
        businessDict['business_name'] = businessRow[1]
        businessDict['business_address1'] = businessRow[2]
        businessDict['business_address2'] = businessRow[3]
        businessDict['business_city'] = businessRow[4]
        businessDict['business_state'] = businessRow[5]
        businessDict['business_zip'] = businessRow[6]
        businessObjects.append(businessDict)

        menuQuery = ("SELECT * FROM menu WHERE business_id = %s" % businessRow[0])
        cursor.execute(menuQuery)
        menuRows = cursor.fetchall()

        menuObjects = []
        for menuRow in menuRows:
            menuDict = collections.OrderedDict()
            menuDict['id'] = menuRow[0]
            menuDict['menu_name'] = menuRow[1]
            menuDict['menu_description'] = menuRow[2]
            menuDict['menu_note'] = menuRow[3]
            menuDict['business_id'] = menuRow[4]
            menuObjects.append(menuDict)

        businessDict['menus'] = menuObjects

        for menuIdx, menuRow in enumerate(menuRows):
            sectionQuery = ("SELECT * FROM menu_section WHERE menu_id = %s" % menuRow[0])
            cursor.execute(sectionQuery)
            sectionRows = cursor.fetchall()

            sectionObjects = []
            for sectionIdx, sectionRow in enumerate(sectionRows):
                sectionDict = collections.OrderedDict()
                sectionDict['id'] = sectionRow[0]
                sectionDict['section_name'] = sectionRow[1]
                sectionDict['section_note'] = sectionRow[2]
                sectionDict['section_description'] = sectionRow[3]
                sectionDict['menu_id'] = sectionRow[4]
                sectionObjects.append(sectionDict)

                businessDict['menus'][menuIdx]['sections'] = sectionObjects

                itemQuery = ("SELECT * FROM menu_item WHERE section_id = %s" % sectionRow[0])
                cursor.execute(itemQuery)
                itemRows = cursor.fetchall()

                itemObjects = []
                for itemIdx, itemRow in enumerate(itemRows):
                    itemDict = collections.OrderedDict()
                    itemDict['id'] = itemRow[0]
                    itemDict['item_name'] = itemRow[1]
                    itemDict['item_description'] = itemRow[2]
                    itemDict['item_note'] = itemRow[3]
                    itemDict['item_price'] = itemRow[4]
                    itemDict['section_id'] = itemRow[5]
                    itemObjects.append(itemDict)

                    businessDict['menus'][menuIdx]['sections'][sectionIdx]['items'] = itemObjects


    objects['businesses'] = businessObjects
    return objects

@route('/test')
def index():
    return json.dumps(getBusinesses())

run(host='192.168.1.70', port=7070)

我想知道这是否是一种有效的做事方式。当我远程部署数据库 (WebFaction) 并在本地运行 Bottle 服务器时,花了近 40 秒才返回几百行。所以看起来好像有些不对劲。我有一种直觉,可能有更好的方法来做到这一点。只是不确定那条路是什么!

最佳答案

如果我不得不冒险猜测:请注意代码的大致结构是:

def getBusinesses():
    businessesQuery = ("SELECT * FROM business")
    businessRows = cursor.fetchall()

    businessObjects = []
    for businessRow in businessRows:
        menuQuery = ("SELECT * FROM menu WHERE business_id = %s" % businessRow[0])
        menuRows = cursor.fetchall()


        for menuIdx, menuRow in enumerate(menuRows):
            sectionQuery = ("SELECT * FROM menu_section WHERE menu_id = %s" % menuRow[0])
            cursor.execute(sectionQuery)
            sectionRows = cursor.fetchall()

            sectionObjects = []
            for sectionIdx, sectionRow in enumerate(sectionRows):
                itemQuery = ("SELECT * FROM menu_item WHERE section_id = %s" % sectionRow[0])
                itemRows = cursor.fetchall()

也就是说,您可以在循环中对 menumenu_section 尤其是 menu_item 执行几乎相同的查询。此外,您使用 fetchall() 返回结果集的完整内容,但仅在循环中检查每个元素一次,在循环中创建另一个对象列表.

您可能想要的是更像:

businesses = []
cursor.execute("select * from business")
row = cursor.fetchone()
while row is not None:
    business.append(...(row))
    row = cursor.fetchone()

cursor.execute("select * from menu")
row = cursor.fetchone()
while row is not None:
    business[row['business_id']].menus.append(...(row))
    row = cursor.fetchone()

cursor.execute("select menu.business_id, menu_section.*"
               " from menu_section"
               " join menu on menu.id = menu_section.menu_id")
row = cursor.fetchone()
while row is not None:
    business[row['business_id']][row['menu_id']].sections.append(...(row))
    row = cursor.fetchone()

cursor.execute("select menu.business_id, menu_section.menu_id, menu_item.*"
               " from menu_item"
               " join menu_section on menu_section.id = menu_item.section_id"
               " join menu on menu.id = menu_section.menu_id")
row = cursor.fetchone()
while row is not None:        
    business[row['business_id']][row['menu_id']][row['section_id'].items.append(...(row))
    row = cursor.fetchone()

这样您发出的查询数量就会少得多,并且仅加载您一次性可以处理的数据量。

关于python - 数据提取和转换效率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23100020/

相关文章:

python - Windows 上的 boost python 在导入 ("__main__"时崩溃);

python - 如何停止在 visual studio 代码中执行 python 脚本?

mysql - 监听数据库表更新/更改

javascript - 为什么将表单链接到连接到 php 的 javascript 文件时,浏览器中会显示 Javascript 代码?

mysql - SQL join 只匹配最大值

python - 从其他计算机访问 Django 应用程序

python - 我想更改 DataFrame (Python) 中的字符串值

python - 使用管道在进程之间传输 Python 对象时的字节限制?

php - mysql_fetch_array 打印内一个字符串。可能的?

mysql - 在此我有 2 个表,我需要一行中的旧值和一行中的新值