Python Pandas - 使用 to_sql 以 block 的形式写入大数据帧

我正在使用 Pandas 的 to_sql 函数写入 MySQL，由于帧大小过大(1M 行，20 列)而导致超时。

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html

有没有更正式的方法来分 block 数据并以 block 的形式写入行？我已经编写了自己的代码，这似乎有效。不过，我更喜欢官方的解决方案。谢谢!

def write_to_db(engine, frame, table_name, chunk_size):

    start_index = 0
    end_index = chunk_size if chunk_size < len(frame) else len(frame)

    frame = frame.where(pd.notnull(frame), None)
    if_exists_param = 'replace'

    while start_index != end_index:
        print "Writing rows %s through %s" % (start_index, end_index)
        frame.iloc[start_index:end_index, :].to_sql(con=engine, name=table_name, if_exists=if_exists_param)
        if_exists_param = 'append'

        start_index = min(start_index + chunk_size, len(frame))
        end_index = min(end_index + chunk_size, len(frame))

engine = sqlalchemy.create_engine('mysql://...') #database details omited
write_to_db(engine, frame, 'retail_pendingcustomers', 20000)

最佳答案

更新:此功能已合并到 pandas master 中，并将在 0.15(可能是九月底)发布，感谢@artemyk!见 https://github.com/pydata/pandas/pull/8062

因此从 0.15 开始，您可以指定 chunksize 参数，例如简单地做:

df.to_sql('table', engine, chunksize=20000)

关于Python Pandas - 使用 to_sql 以 block 的形式写入大数据帧，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24007762/

上一篇：java - 如何将 java.util.Date 存储到 UTC/GMT 时区的 MySQL 时间戳字段中？

下一篇：php - 给定尺寸的所有矩形的搜索矩阵(选择座位 block )

相关文章：

SQL Server 相当于 MySQL 的 NOW()？

python - import matplotlib.pyplot 给出 ImportError : dlopen(…) Library not loaded libpng15. 15.dylib

python - 在python中将Excel文件与 Pandas 合并

php - MySQL 没有正确计算基于日期值的行

mysql - SQL插入语句给出未知列错误

php - FROM 语句后选择更多的表？

python - 查找列表的所有可能子列表

python - 如何使用 Python 列出进程加载的所有 dll？

php - 警告 : Illegal string offset 'todo' | PHP

c# - 解析 SQL 查询时出错