我正在开展一个项目,该项目结合了一些基于注册用户的数据源。特别是一个查询给我带来了很多问题:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
from sqlalchemy import create_engine
# of course, the info here is obscured
prod_engine = create_engine('mysql+mysqlconnector://password@host:3306/database',pool_timeout=3600,pool_recycle=3600)
query_users = """
SELECT users.id,
CASE
WHEN ((users.role = '' OR users.role IS NULL) AND users.plan LIKE 'pro%') OR users.role REGEXP '(pro|agent|manager)' THEN 'professional' ELSE 'consumer'
END AS 'modified_role',
users.created_at,
users.logged_in_at AS 'last_login',
COUNT(DISTINCT(folders.id)) AS 'folder_count',
IF(COUNT(DISTINCT(folders.id)) > 1, '2 or more','0 to 1') AS 'folder_group',
MIN(folders.created_at) AS 'first_folder_created',
MAX(folders.created_at) AS 'last_folder_created'
FROM users
LEFT OUTER JOIN folders
ON folders.created_by = users.id
AND folders.discarded = 0
AND folders.created_at >= '2010-11-30 23:59:59'
WHERE users.invalid_email IS NULL
GROUP BY 1"""
users = pd.read_sql_query(query_users, prod_engine)
无论我尝试过什么,我都会收到此错误(几乎总是在三秒内,有时是立即)。
InterfaceError: (InterfaceError) 2013: Lost connection to MySQL server during query
我已经尝试了一些方法,例如根据此处的文档将 pool_timeout
和 pool_recycle
选项添加到 create_engine
函数 http://docs.sqlalchemy.org/en/latest/core/engines.html
我也试过 users = pd.read_sql_query(query_folder_users, prod_engine,chunksize=10000)
但得到同样的错误。
有趣的是,每当我在 Sequel Pro 中运行这个查询时,它都能正常工作;它立即开始返回行,只需要大约 10 秒就可以完全完成。输出约为 550,000 行。
我发现了很多其他主题/帖子,但似乎没有一个能完全满足我的需要: https://groups.google.com/forum/#!topic/sqlalchemy/TWL7aWab9ww Handle SQLAlchemy disconnect http://blog.fizyk.net.pl/blog/reminder-set-pool_recycle-for-sqlalchemys-connection-to-mysql.html
在这里阅读文档 http://dev.mysql.com/doc/refman/5.5/en/error-lost-connection.html ,我注意到这一行:
Sometimes the “during query” form happens when millions of rows are being sent as part of one or more queries. If you know that this is happening, you should try increasing net_read_timeout from its default of 30 seconds to 60 seconds or longer, sufficient for the data transfer to complete.
似乎我可能需要更改此选项,但我在 SQLAlchemy 文档中找不到任何提及此内容的内容。
有没有人遇到过这个问题?如果是这样,您是如何解决的?
最佳答案
检查您的 MySQL 服务器 max_allowed_packet size
变量并增加它。大多数情况下,MySQL 在查询期间断开连接是因为负载太大。
关于python - Sqlalchemy 在查询过程中失去连接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27866176/