grails - 在Async Promise中批量加载时,Grails DuplicateKeyException/NonUniqueObjectException

标签 grails asynchronous gorm promise batch-processing

我有大量的数据希望使用GORM加载到数据库中。

class DbLoadingService {

    static transactional = false    
    // these are used to expedite the batch loading process
    def sessionFactory
    def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP

    // these are example services that will assist in the parsing of the input data    
    def auxLoadingServiceA
    def auxLoadingServiceB

    def handleInputFile(String filename) {
        def inputFile = new File(filename)
        // parse each line and process according to record type
        inputFile.eachLine { line, lineNumber ->
            this.handleLine(line, lineNumber)
        }
    }


    @Transactional
    def handleLine(String line, int lineNumber) {
        // do some further parsing of the line, based on its content
        // example here is based on 1st 2 chars of line
        switch (line[0..1]) {
            case 'AA':
                auxLoadingServiceA.doSomethingWithLine(line)
                break;

            case 'BB':
                auxLoadingServiceB.doSomethingElseWithLine(line)
                break;

            default:
                break;

        }
        if (lineNumber % 100 == 0) cleanUpGorm()
    }

    def cleanUpGorm() {
        def session = sessionFactory.getCurrentSession()
        session.flush()
        session.clear()
        propertyInstanceMap.get().clear()
    }

}

class AuxLoadingServiceA {
    static transactional = false

    doSomethingWithLine(String line) {
        // do something here
    }
}

class AuxLoadingServiceB {
    static transactional = false

    doSomethingElseWithLine(String line) {
        // do something else here
    }
}

我故意使顶级服务仅适用于每行的负载。实际上,顶层下有许多服务级别,而不仅仅是显示的单个Aux A&B服务层。因此,我不想承担多层事务的开销:我认为我只需要1。

加载到数据库中的数据模型包括几个具有hasMany / belongsTo关系的域对象。与域对象的这种交互是在子层中完成的,因此在我的代码中没有显示,以使示例的大小易于管理。

似乎引起问题的域对象类似于以下内容:
class Parent {
    static hasMany = [children: Child]
    static mapping = {
        children lazy: false
        cache true
    }
}

class Child {
    String someValue
    // also contains some other sub-objects

    static belongsTo = [parent : Parent]

    static mapping = {
        parent index: 'parent_idx'
        cache true
    }
}

显示的cleanupGorm()方法是必需的,否则服务会在经过很多行后变得完全停止。

当我启动数据库加载时,所有工作都完全按预期进行:
// Called from with a service / controller
dbLoadingService.handleInputFile("someFile.txt")

但是,一旦我将负载移到异步进程中,如下所示:
def promise = task {
    dbLoadingService.handleInputFile("someFile.txt")
}

我得到了DuplicateKeyException / NonUniqueObjectException:
error details: org.springframework.dao.DuplicateKeyException: A different object with the same identifier value was already associated with the session : [com.example.SampleDomainObject#1]; nested exception is org.hibernate.NonUniqueObjectException: A different object with the same identifier value was already associated with the session : [com.example.SampleDomainObject#1]

因此,我的问题是,围绕将大量数据异步加载到Grails DB中的最佳实践是什么?关于刷新/清除 session 是否需要做一些事情,以确保内存中的对象在 session 中保持一致?缓存对象时需要做些什么吗?

最佳答案

解决方案是按照@JoshuaMoore的建议进行操作,并使用新的 session 。此外,存在对某个域对象的引用,该域对象是从一个事务外部引用的,该事务随后在新 session 中没有对merge()进行调用,从而导致错误。


def obj = DomainObject.findBySomeProperty('xyz')

// now start new session

obj.someProperty // causes exception
obj = obj.merge()
obj.someProperty // doesn't cause an exception

约书亚(Joshua)的评论促使我深入研究Hibernate(https://docs.jboss.org/hibernate/orm/3.6/reference/en-US/html/transactions.html)的文档

具体来说,从第13章开始:

A SessionFactory is an expensive-to-create, threadsafe object, intended to be shared by all application threads. It is created once, usually on application startup, from a Configuration instance.

A Session is an inexpensive, non-threadsafe object that should be used once and then discarded for: a single request, a conversation or a single unit of work. A Session will not obtain a JDBC Connection, or a Datasource, unless it is needed. It will not consume any resources until used.



其他人可能会感兴趣的是,即使使用Burt Beckwith here建议的性能优化,但随着解析的对象数量的增加,批处理加载的性能逐渐下降:并由Ted Naleid here进行了更详细的解释。

因此,使用文档中的提示,性能问题的答案不是试图将 session 用于所有处理-而是使用它来进行少量处理,然后丢弃它并创建一个新的 session 。

当我删除问题中的cleanupGorm()方法并将其替换为以下内容时,我的性能提高了 6倍​​,即使解析了数千万条记录,加载时间也绝对不会增加批处理大小:
// somewhere in the service method that is doing the batch parse
def currentSession = sessionFactory.openSession()

// start some form of batch parse, perhaps in a loop

    // do work here
    // periodically, perhaps in the %N way shown above
    currentSession.flush()
    currentSession.close()
    currentSession = sessionFactory.openSession()

// end of loop

在需要跨服务的事务中包装东西的地方,我做了以下事情:
currentSession = sessionFactory.openSession()
currentSession.beginTransaction()

// start loop
// do work

// when we want to commit
def tx = currentSession?.getTransaction()
if (tx?.isActive()) tx.commit()
currentSession?.close()

// if we're in a loop and we need a new transaction
currentSession = sessionFactory.openSession()
currentSession.beginTransaction()

尽管我接受使用Spring Batch之类的方法可能更好,但它涉及丢弃大量原本可以正常工作的代码。我将在下次需要这样做时进行调查,但与此同时,希望这可能对需要与Grails进行大规模批处理的其他人一起使用,并发现批处理大小会降低性能。

约书亚记:非常感谢您的帮助,非常感谢!

关于grails - 在Async Promise中批量加载时,Grails DuplicateKeyException/NonUniqueObjectException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28439754/

相关文章:

c#:在完成之前经过一定时间后重新启动异步任务

WordPress - 添加帖子类别需要刷新页面才能显示

grails - 按顺序获取Grails域的持久性属性

rest - Grails(未保存)==来自动态查找器的null

hibernate - Grails withNewSession 不会刷新

grails - 重用域类的属性,约束和行为,而无需继承

grails - 为什么在我要求mysql时grails使用hsqldb?

javascript - 如何在第二个ajax调用之外获取值

grails - gorm-无法将对象保存到数据库

java - 具有类路径问题的 grails war 的 Tomcat 部署