我有大量的数据希望使用GORM加载到数据库中。
class DbLoadingService {
static transactional = false
// these are used to expedite the batch loading process
def sessionFactory
def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP
// these are example services that will assist in the parsing of the input data
def auxLoadingServiceA
def auxLoadingServiceB
def handleInputFile(String filename) {
def inputFile = new File(filename)
// parse each line and process according to record type
inputFile.eachLine { line, lineNumber ->
this.handleLine(line, lineNumber)
}
}
@Transactional
def handleLine(String line, int lineNumber) {
// do some further parsing of the line, based on its content
// example here is based on 1st 2 chars of line
switch (line[0..1]) {
case 'AA':
auxLoadingServiceA.doSomethingWithLine(line)
break;
case 'BB':
auxLoadingServiceB.doSomethingElseWithLine(line)
break;
default:
break;
}
if (lineNumber % 100 == 0) cleanUpGorm()
}
def cleanUpGorm() {
def session = sessionFactory.getCurrentSession()
session.flush()
session.clear()
propertyInstanceMap.get().clear()
}
}
class AuxLoadingServiceA {
static transactional = false
doSomethingWithLine(String line) {
// do something here
}
}
class AuxLoadingServiceB {
static transactional = false
doSomethingElseWithLine(String line) {
// do something else here
}
}
我故意使顶级服务仅适用于每行的负载。实际上,顶层下有许多服务级别,而不仅仅是显示的单个Aux A&B服务层。因此,我不想承担多层事务的开销:我认为我只需要1。
加载到数据库中的数据模型包括几个具有hasMany / belongsTo关系的域对象。与域对象的这种交互是在子层中完成的,因此在我的代码中没有显示,以使示例的大小易于管理。
似乎引起问题的域对象类似于以下内容:
class Parent {
static hasMany = [children: Child]
static mapping = {
children lazy: false
cache true
}
}
class Child {
String someValue
// also contains some other sub-objects
static belongsTo = [parent : Parent]
static mapping = {
parent index: 'parent_idx'
cache true
}
}
显示的cleanupGorm()方法是必需的,否则服务会在经过很多行后变得完全停止。
当我启动数据库加载时,所有工作都完全按预期进行:
// Called from with a service / controller
dbLoadingService.handleInputFile("someFile.txt")
但是,一旦我将负载移到异步进程中,如下所示:
def promise = task {
dbLoadingService.handleInputFile("someFile.txt")
}
我得到了DuplicateKeyException / NonUniqueObjectException:
error details: org.springframework.dao.DuplicateKeyException: A different object with the same identifier value was already associated with the session : [com.example.SampleDomainObject#1]; nested exception is org.hibernate.NonUniqueObjectException: A different object with the same identifier value was already associated with the session : [com.example.SampleDomainObject#1]
因此,我的问题是,围绕将大量数据异步加载到Grails DB中的最佳实践是什么?关于刷新/清除 session 是否需要做一些事情,以确保内存中的对象在 session 中保持一致?缓存对象时需要做些什么吗?
最佳答案
解决方案是按照@JoshuaMoore的建议进行操作,并使用新的 session 。此外,存在对某个域对象的引用,该域对象是从一个事务外部引用的,该事务随后在新 session 中没有对merge()进行调用,从而导致错误。
即
def obj = DomainObject.findBySomeProperty('xyz')
// now start new session
obj.someProperty // causes exception
obj = obj.merge()
obj.someProperty // doesn't cause an exception
约书亚(Joshua)的评论促使我深入研究Hibernate(https://docs.jboss.org/hibernate/orm/3.6/reference/en-US/html/transactions.html)的文档
具体来说,从第13章开始:
A SessionFactory is an expensive-to-create, threadsafe object, intended to be shared by all application threads. It is created once, usually on application startup, from a Configuration instance.
A Session is an inexpensive, non-threadsafe object that should be used once and then discarded for: a single request, a conversation or a single unit of work. A Session will not obtain a JDBC Connection, or a Datasource, unless it is needed. It will not consume any resources until used.
其他人可能会感兴趣的是,即使使用Burt Beckwith here建议的性能优化,但随着解析的对象数量的增加,批处理加载的性能逐渐下降:并由Ted Naleid here进行了更详细的解释。
因此,使用文档中的提示,性能问题的答案不是试图将 session 用于所有处理-而是使用它来进行少量处理,然后丢弃它并创建一个新的 session 。
当我删除问题中的cleanupGorm()方法并将其替换为以下内容时,我的性能提高了 6倍,即使解析了数千万条记录,加载时间也绝对不会增加批处理大小:
// somewhere in the service method that is doing the batch parse
def currentSession = sessionFactory.openSession()
// start some form of batch parse, perhaps in a loop
// do work here
// periodically, perhaps in the %N way shown above
currentSession.flush()
currentSession.close()
currentSession = sessionFactory.openSession()
// end of loop
在需要跨服务的事务中包装东西的地方,我做了以下事情:
currentSession = sessionFactory.openSession()
currentSession.beginTransaction()
// start loop
// do work
// when we want to commit
def tx = currentSession?.getTransaction()
if (tx?.isActive()) tx.commit()
currentSession?.close()
// if we're in a loop and we need a new transaction
currentSession = sessionFactory.openSession()
currentSession.beginTransaction()
尽管我接受使用Spring Batch之类的方法可能更好,但它涉及丢弃大量原本可以正常工作的代码。我将在下次需要这样做时进行调查,但与此同时,希望这可能对需要与Grails进行大规模批处理的其他人一起使用,并发现批处理大小会降低性能。
约书亚记:非常感谢您的帮助,非常感谢!
关于grails - 在Async Promise中批量加载时,Grails DuplicateKeyException/NonUniqueObjectException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28439754/