performance - 迭代器中的 "Recycling"项以获得更好的性能

我有一个文件，其中包含某些复杂数据类型的多个实例(想想事件的轨迹)。读取此文件的 API 是用 C 语言编写的，我对它没有太多控制权。为了将其暴露给 Rust，我实现了以下接口(interface):

// a single event read from the file
struct Event {
    a: u32,
    b: f32,
}

// A handle to the file used for I/O
struct EventFile;

impl EventFile {
    fn open() -> Result<EventFile, Error> {
        unimplemented!()
    }

    // read the next step of the trajectory into event
    fn read(&self, event: &mut Event) -> Result<(), Error> {
        event.a = unimplemented!();
        event.b = unimplemented!();
    }
}

要访问文件内容，我可以调用 read 函数，直到它返回类似于此的 Err:

let event_file = EventFile::open();
let mut event = Event::new();

let mut result = event_file.read(&mut event);
while let Ok(_) = result {
    println!("{:?}", event);
    result = event_file.read(&mut event);
}

因为每次调用 read 都会重用事件，所以不会重复分配/取消分配内存，这有望带来一些性能提升(事件结构在实际实现中要大得多)。

现在，如果能够通过迭代器访问这些数据就好了。但是，据我所知，这意味着每次迭代器产生时我都必须创建一个 Event 的新实例 - 因为我不能在迭代器中重用事件。这会损害性能:

struct EventIterator {
    event_file: EventFile,
}
impl Iterator for EventIterator {
    type Item = Event;
    fn next(&mut self) -> Option<Event> {
        let mut event = Event::new(); // costly allocation
        let result = self.event_file.read(&mut event);
        match result {
            Ok(_) => Some(event),
            Err(_) => None,
        }
    }
}

let it = EventIterator { event_file };
it.map(|event| unimplemented!())

有没有办法在迭代器中以某种方式“回收”或“重用”事件？或者这是一个根本无法转移到 Rust 的概念，在这种情况下我不得不忍受使用迭代器的更差性能？

最佳答案

您可以通过包装 Item 在迭代之间“回收”项目在引用计数器中。这里的想法是，如果调用者在迭代之间保留该项目，则迭代器会分配一个新对象并返回该新对象。如果该项目在下一次迭代开始之前被调用者删除，则该项目将被回收。 std::rc::Rc::get_mut() 确保了这一点，如果引用计数恰好为 1，它将仅返回引用。

这有一个缺点，即您的 Iterator产量 Rc<Foo>而不是 Foo .由于引用计数，还增加了代码复杂性和(可能)一些运行时成本(如果编译器可以证明这一点，可能会完全消除)。

因此，您需要衡量这是否真的让您获得了性能上的胜利。在每次迭代中分配一个新对象可能看起来代价高昂，但分配器擅长于此......

有些东西

use std::rc::Rc;

#[derive(Default)]
struct FoobarIterator {
    item: Rc<String>,
}

impl Iterator for FoobarIterator {
    type Item = Rc<String>;

    fn next(&mut self) -> Option<Self::Item> {
        let item = match Rc::get_mut(&mut self.item) {
            Some(item) => {
                // This path is only taken if the caller
                // did not keep the item around
                // so we are the only reference-holder!
                println!("Item is re-used!");
                item   
            },
            None => {
                // Let go of the item (the caller gets to keep it)
                // and create a new one
                println!("Creating new item!");
                self.item = Rc::new(String::new());
                Rc::get_mut(&mut self.item).unwrap()
            }
        };
        // Create the item, possible reusing the same allocation...
        item.clear();
        item.push('a');
        Some(Rc::clone(&self.item))
    }
}

fn main() {
    // This will only print "Item is re-used"
    // because `item` is dropped before the next cycle begins
    for item in FoobarIterator::default().take(5) {
        println!("{}", item);
    }

    // This will allocate new objects every time
    // because the Vec retains ownership.
    let _: Vec<_> = FoobarIterator::default().take(5).collect();
}

关于performance - 迭代器中的 "Recycling"项以获得更好的性能，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58910603/

performance - 迭代器中的 "Recycling"项以获得更好的性能

上一篇：vector - 如何安全地从矢量中删除项目？

下一篇：rust - 如何在没有for循环的情况下获取迭代中正在处理的当前元素的索引？