我一直在研究在 Doctrine ( http://docs.doctrine-project.org/en/2.0.x/reference/batch-processing.html ) 中使用迭代器进行批处理。我有一个包含 20,000 张图像的数据库,我想对其进行迭代。
我知道使用迭代器应该可以防止 Doctrine 加载内存中的每一行。然而,两个示例之间的内存使用几乎完全相同。我正在计算使用 (memory_get_usage() / 1024)
前后的内存使用情况.
$query = $this->em->createQuery('SELECT i FROM Acme\Entities\Image i');
$iterable = $query->iterate();
while (($image = $iterable->next()) !== false) {
// Do something here!
}
迭代器的内存使用情况。
Memory usage before: 2823.36328125 KB
Memory usage after: 50965.3125 KB
第二个示例使用
findAll
将整个结果集加载到内存中方法。$images = $this->em->getRepository('Acme\Entities\Image')->findAll();
findAll
的内存使用情况.Memory usage before: 2822.828125 KB
Memory usage after: 51329.03125 KB
最佳答案
即使在 iterate()
的帮助下,使用 Doctrine 进行批处理也比看起来更棘手。和 IterableResult
.
正如您所期望的IterableResult
的最大好处是它不会将所有元素加载到内存中,第二个好处是它不保存对加载的实体的引用,因此 IterableResult
不会阻止 GC 从您的实体中释放内存。
然而,还有另一个对象 Doctrine's EntityManager
(更具体地说 UnitOfWork
),它保存对您显式或隐式查询的每个对象的所有引用( EAGER
关联)。
简而言之,每当您获得 findAll()
返回的任何实体时findOneBy()
即使通过 DQL
查询以及 IterableResult
,然后在 Doctrine 中保存对每个实体的引用。引用简单地存储在一个 assoc 数组中,这是伪代码:$identityMap['Acme\Entities\Image'][0] = $image0;
因此,因为在循环的每次迭代中,您之前的图像(尽管不在循环范围或 IterableResult
的范围内)仍然存在于此 identityMap
中。 , GC 无法清理它们并且您的内存消耗与您调用 findAll()
时相同.
现在让我们通过代码看看实际发生了什么
$query = $this->em->createQuery('SELECT i FROM Acme\Entities\Image i');
//这里 Doctrine 只创建 Query 对象,这里没有 db 访问
$iterable = $query->iterate();
//与 findAll() 不同,在此调用时不会发生数据库访问。
//这里 Query 对象简单地包装在一个迭代器中
while (($image_row = $iterable->next()) !== false) {
// now upon the first call to next() the DB WILL BE ACCESSED FOR THE FIRST TIME
// the first resulting row will be returned
// row will be hydrated into Image object
// ----> REFERENCE OF OBJECT WILL BE SAVED INSIDE $identityMap <----
// the row will be returned to you via next()
// to access actual Image object, you need to take [0]th element of the array
$image = $image_row[0];
// Do something here!
write_image_data_to_file($image,'myimage.data.bin');
//now as the loop ends, the variables $image (and $image_row) will go out of scope
// and from what we see should be ready for GC
// however because reference to this specific image object is still held
// by the EntityManager (inside of $identityMap), GC will NOT clean it
}
// and by the end of your loop you will consume as much memory
// as you would have by using `findAll()`.
所以第一个解决方案实际上是告诉 Doctrine EntityManager 将对象从
$identityMap
中分离出来。 .我也换了while
循环到 foreach
使其更具可读性。foreach($iterable as $image_row){
$image = $image_row[0];
// do something with the image
write_image_data_to_file($image);
$entity_manager->detach($image);
// this line will tell doctrine to remove the _reference_to_the_object_
// from identity map. And thus object will be ready for GC
}
然而,上面的例子几乎没有缺陷,即使它在 doctrine's documentation on batch processing 中有特色。 .它运作良好,以防您的实体
Image
不执行 EAGER
加载它的任何关联。但是,如果您急切地加载任何关联,例如。 :/*
@ORM\Entity
*/
class Image {
/*
@ORM\Column(type="integer")
@ORM\Id
*/
private $id;
/*
@ORM\Column(type="string")
*/
private $imageName;
/*
@ORM\ManyToOne(targetEntity="Acme\Entity\User", fetch="EAGER")
This association will be automatically (EAGERly) loaded by doctrine
every time you query from db Image entity. Whether by findXXX(),DQL or iterate()
*/
private $owner;
// getters/setters left out for clarity
}
因此,如果我们使用与上面相同的代码段,则
foreach($iterable as $image_row){
$image = $image_row[0];
// here becuase of EAGER loading, we already have in memory owner entity
// which can be accessed via $image->getOwner()
// do something with the image
write_image_data_to_file($image);
$entity_manager->detach($image);
// here we detach Image entity, but `$owner` `User` entity is still
// referenced in the doctrine's `$identityMap`. Thus we are leaking memory still.
}
可能的解决方案是使用
EntityManager::clear()
代替或 EntityManager::detach()
这将完全清除身份映射。foreach($iterable as $image_row){
$image = $image_row[0];
// here becuase of EAGER loading, we already have in memory owner entity
// which can be accessed via $image->getOwner()
// do something with the image
write_image_data_to_file($image);
$entity_manager->clear();
// now ``$identityMap` will be cleared of ALL entities it has
// the `Image` the `User` loaded in this loop iteration and as as
// SIDE EFFECT all OTHER Entities which may have been loaded by you
// earlier. Thus you when you start this loop you must NOT rely
// on any entities you have `persist()`ed or `remove()`ed
// all changes since the last `flush()` will be lost.
}
所以希望这有助于理解 Doctrine 迭代。
关于Doctrine 批处理迭代高内存使用率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23545768/