我正在努力在许多 csv 中导入 300000 行。
首先,我获取 csv 并将每一行导入到我的数据库中的一个表中。
在我想解析所有行并插入到与该数据有某种关系的正确表中之后。
所以我试过这个:
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->getQuery()
->getResult();
$countRows = 0;
foreach ($flows as $row) {
//some check
$entity = new TestTable();
$entity->setCode($row->getCode());
//many other fields
$this->entityManager->persist($entity);
$this->entityManager->flush();
}
在这种情况下,每一行的所有过程大约需要 5 秒!
现在如果我像这样添加 setMaxResults:
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->setMaxResults(100)
->getQuery()
->getResult();
不到 1 秒!
所以我想获取所有行并将其拆分为一个带有 setMaxResult 的递归函数,如下所示:
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->getQuery()
->getResult();
$countFlows = count($flows);
$numberOfQuery = $countFlows / 100;
for ($i = 0; $i <= $numberOfQuery; $i++) {
$this->entityManager->clear();
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->setFirstResult($i * 100)
->setMaxResults(100)
->getQuery()
->getResult();
}
通过这种方式,我创建了许多拆分为 100 行的查询。 解析多行并插入它是一种好的做法还是有更好的方法?
最佳答案
official documentation of Doctrine推荐的高效方式正在利用 EntityManager
的事务性后写行为。
为数据处理迭代大型结果
You can use the
iterate()
method just to iterate over a large result and no UPDATE or DELETE intention. TheIterableResult
instance returned from$query->iterate()
implements the Iterator interface so you can process a large result without memory problems using the following approach. (See example)
批量插入
Bulk inserts in Doctrine are best performed in batches, taking advantage of the transactional write-behind behavior of an
EntityManager
. [...] You may need to experiment with the batch size to find the size that works best for you. Larger batch sizes mean more prepared statement reuse internally but also mean more work duringflush
. (See example)
混合两种技术的版本(实体存储库内部):
$q = $this->_em->createQuery('SELECT f FROM AppBundle:FlowAndata f');
$iterableResult = $q->iterate();
$i = 0;
$batchSize = 100;
foreach ($iterableResult as $row) {
// do stuff with the data in the row, $row[0] is always the object
/** @var AppBundle\Entity\FlowAndata $flow */
$flow = $row[0];
//some check
$entity = new TestTable();
$entity->setCode($row->getCode());
//many other fields
$this->_em->persist($entity);
$i++;
if (($i % $batchSize) === 0) {
$this->_em->flush();
// Detaches all objects from Doctrine!
$this->_em->clear();
} else {
// detach from Doctrine, so that it can be Garbage-Collected immediately
$this->_em->detach($flow);
}
}
$this->_em->flush(); //Persist objects that did not make up an entire batch
$this->_em->clear();
关于php - Doctrine 插入很多数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45106417/