python - INSERT 命令不按顺序使用 sqlalchemy 和 mysql 大行

标签 python mysql sqlalchemy

我正在使用 sqlalchemy 写入 mysql 数据库,我在其中索引一些文件并存储它们的内容。我需要写入文件,然后将具有外键的索引条目写入 files 表。但是,sqlalchemy 似乎乱序发出了 INSERT 语句。

这是一个使用模拟随机数据(减去包含服务器特定信息的配置文件)说明问题的最小功能示例:

索引/ORM.py:

#!/bin/env python2.7

from __future__ import print_function

import os

from sqlalchemy import Column, ForeignKey, Integer, String
from sqlalchemy.dialects.mysql import LONGBLOB, INTEGER
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy import create_engine

from Index import load_cfg

class Base(object):
    """
    Basic MySQL table settings
    """
    __table_args__ = {
            'mysql_engine': 'InnoDB',
            'mysql_collate': 'latin1_general_cs'
            }

Base = declarative_base(cls=Base)

class CoverageIndex(Base):
    """
    Class for coverage_index table objects
    """
    __tablename__ = 'coverage_index'

    filename = Column(String(45), primary_key=True)
    #filename = Column(String(45), ForeignKey("files.filename"), primary_key=True)
    sequence_id = Column(String(45), primary_key=True, index=True)

    def __init__(self, filename, sequence_id):
        self.filename = filename
        self.sequence_id = sequence_id

class FileRow(Base):
    """
    Class for files stored in db
    """
    __tablename__ = 'files'

    filename = Column(String(45), primary_key=True)
    contents = Column(LONGBLOB)

    def __init__(self, filename, contents):
        self.filename = filename
        self.contents = contents

cfg = load_cfg()
db_string = 'mysql://%(user)s:%(passwd)s@%(host)s/%(db)s' % cfg['db_config']
engine = create_engine(db_string, echo=True)
Base.metadata.create_all(engine)

if __name__ == '__main__':
    pass

索引.py:

#!/usr/bin/env python2.7

from __future__ import print_function

import os
import sys

from sqlalchemy import Column, ForeignKey, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
from sqlalchemy.exc import IntegrityError

from Index.ORM import Base, CoverageIndex, FileRow, engine as db_engine

if __name__ == '__main__':
    import string, random

    data = {}
    for i in range(0,10):
        file = 'file' + str(i)
        data[file] = {
                'seqs': ['seqa' + str(i), 'seqb' + str(i)],
                'contents': '\n'.join([''.join([random.choice(string.letters) for x in range (0, 80)]) for y in range (0, 2500)])}
    #print (data)

    Base.metadata.bind = db_engine

    DBSession = sessionmaker(bind=db_engine)
    session = DBSession()

    for file, datum in data.iteritems():
        file_query = session.query(FileRow).filter(FileRow.filename == file)
        if file_query.count() > 0:
            session.query(CoverageIndex).filter(CoverageIndex.filename == file).delete(synchronize_session='fetch')
            file_query.delete(synchronize_session='fetch')
        for i in datum['seqs']: 
            # Write to DB
            fqc = file_query.count() 
            print ("No. of files: " + str(fqc))
            if fqc == 0:
                print ("Adding: ")
                fr = FileRow(
                        filename = file,
                        contents = datum['contents']
                        )
                session.add(fr)
            cov = CoverageIndex(
                    filename = file, 
                    sequence_id = i) 
            session.add(cov)
        try:
            session.commit()
        except:
            #print ("SQL Commit Failed: %s" % file)
            session.rollback()
            session.close()
            raise
    session.close()

这是一次运行的输出的一部分。我想提请您注意 2018-03-13 16:05:40,291...,292 行:

...
2018-03-13 16:05:40,287 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)                                                                    
2018-03-13 16:05:40,288 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1                                                          
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents                                                                
FROM files                                                                                                                                     
WHERE files.filename = %s) AS anon_1                                                                                                           
2018-03-13 16:05:40,288 INFO sqlalchemy.engine.base.Engine ('file1',)                                                                          
2018-03-13 16:05:40,290 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1                                                          
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents                                                                
FROM files                                                                                                                                     
WHERE files.filename = %s) AS anon_1                                                                                                           
2018-03-13 16:05:40,290 INFO sqlalchemy.engine.base.Engine ('file1',)                                                                          
No. of files: 0                                                                                                                                
Adding:                                                                                                                                        
2018-03-13 16:05:40,291 INFO sqlalchemy.engine.base.Engine INSERT INTO coverage_index (filename, sequence_id) VALUES (%s, %s)                  
2018-03-13 16:05:40,291 INFO sqlalchemy.engine.base.Engine ('file1', 'seqa1')                                                                  
2018-03-13 16:05:40,292 INFO sqlalchemy.engine.base.Engine INSERT INTO files (filename, contents) VALUES (%s, %s)                              
2018-03-13 16:05:40,292 INFO sqlalchemy.engine.base.Engine ('file1', 'BkTsRJTcNEigPFjofFxDmwVZDXRAsPECawRUjiFZTDGWWoLZzLnGlCwQQeAFyXhLqKjPAJmme
mFNfVzF\nJlZSvwGAdoImTnBAmcrSdMRDvxNYnnMfbQXdfuXulqufiIYpqjFUgfElZSrVkvBvPTg ... (204700 characters truncated) ... trwtYOycEOuDTVxsXeGoNYKAqHlE
LGPqcimwzwAFAEsCZGBBnGzYMHgabgnGZaGmQsn\nSNjYvBwSVdXVKbmJpKdSHSXCDKKvDlkyLxOxsEfOtmlCRruqzaiPhYRocKZQEJSVrtSHncFMBMTEpWUX')                    
2018-03-13 16:05:40,310 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1                                                          
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents                                                                
FROM files                                                                                                                                     
WHERE files.filename = %s) AS anon_1                                                                                                           
2018-03-13 16:05:40,310 INFO sqlalchemy.engine.base.Engine ('file1',)                                                                          
No. of files: 1                                                                                                                                
2018-03-13 16:05:40,311 INFO sqlalchemy.engine.base.Engine INSERT INTO coverage_index (filename, sequence_id) VALUES (%s, %s)                  
2018-03-13 16:05:40,311 INFO sqlalchemy.engine.base.Engine ('file1', 'seqb1')                                                                  
2018-03-13 16:05:40,312 INFO sqlalchemy.engine.base.Engine COMMIT       
...

在这里,您可以看到 sqlalchemy 在插入 files 对象之前插入了 coverage_index 。我推测这是因为 files 对象要大得多并且需要一些时间来准备,所以引擎决定先异步运行后面的 INSERT

但是,需要先插入files条目,因为coverage_index中的filename应该是files<的外键。 (如果我在定义外键约束的情况下这样做,它会抛出异常)

我知道我可以在添加到 files 之后提交,但我更喜欢 filescoverage_index INSERT 在同一个事务中,所以它们保持同步。

那么问题来了,有没有办法强制sqlalchemy在一个事务中同步执行呢?

最佳答案

不确定这是否是最好的方法,但它似乎实现了我的目标:

flush(objects=None)

Flush all the object changes to the database.

Writes out all pending object creations, deletions and modifications to the database as INSERTs, DELETEs, UPDATEs, etc. Operations are automatically ordered by the Session’s unit of work dependency solver.

Database operations will be issued in the current transactional context and do not affect the state of the transaction, unless an error occurs, in which case the entire transaction is rolled back. You may flush() as often as you like within a transaction to move changes from Python to the database’s transaction buffer

感谢:

Is SQLAlchemy saves order in adding objects to session?

http://www.aosabook.org/en/sqlalchemy.html - 第 20.9 节工作单元

关于python - INSERT 命令不按顺序使用 sqlalchemy 和 mysql 大行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49267511/

相关文章:

python - 多图中刻度线的长度相同

python - 每次打印时覆盖打印行以实现时间算法

php - 从 MySQL 中存储的数据获取最近 5 周的数据

python - 将预构建的查询附加到 SQLAlchemy 中的scoped_session

python - 如何在 sqlalchemy execute 中传递原始 sql 中的当前日期

javascript - 如何在 Flask 中根据数据库 (PostgreSQL/SQLAlchemy) 的不同列创建 JSON 文件,以便在 JavaScript 代码中使用它?

python - Google App Engine : using users. get_current_user().user_id() 作为 NDB 实体的 id

python - gobject.MainLoop 和 tornado.IOLoop 一次?

php - 在php中自动生成用户名

mysql - 选择不同操作数应包含 1 列