rust - 编写 Rust-y 代码 : Keeping references to different structs depending on type of object in XML

我很难以使用rust 的方式来制定这个，因为我的大脑仍然在 Python 中硬连线。所以我有一个 XML 文件:

<xml>
<car>
  <name>First car</name>
  <brand>Volvo</brand>
</car>
<plane>
  <name>First plane</name>
  <brand>Boeing</brand>
</plane>
<car>
  <name>Second car</name>
  <brand>Volvo</brand>
</car>
</xml>

实际上它要复杂得多，XML 大约 500-1000MB 大。我正在使用 quick-xml 阅读这篇文章，它给了我诸如 Start(标签开始)、Text 和 End(标签结束)之类的事件，我正在做一个状态机来跟踪。
现在我想将 car 和 plane 的解析卸载到不同的模块(它们需要以不同的方式处理)，但共享一个基本实现/特征。
到现在为止还挺好。
现在使用我的状态机，我知道什么时候需要卸载到汽车或飞机上:

当我输入主要汽车标签时，我想创建一个汽车的新实例

之后，卸载所有内容，直到对应的 </car>给它

当我们到达终点时，我会调用 .save()在汽车实现上将其存储在其他地方，并可以释放/销毁实例。

但这意味着在我的主循环中，我需要创建汽车的一个新实例并跟踪它(如果这是主要元素，飞机也是如此。

    let mut current_xml_section: I_DONT_KNOW_THE_TYPE = Some()
    

    loop {
        match reader.read_event(&mut buf) {

            Ok(Event::Start(ref e)) => {
                if state == State::Unknown {
                    match e.name() {
                        b"car" => {
                            state = State::InSection;
                            current_section = CurrentSection::Car;
                            state_at_depth = depth;
                            
                            current_xml_section = CurrentSection::Car::new(e); // this won't work

                        },
                        b"plane" => {
                            state = State::InSection;
                            current_section = CurrentSection::Plane;
                            state_at_depth = depth;
                            
                            current_xml_section = CurrentSection::Plane::new(e); // this won't work
                        },
                        _ => (),
                    };
                }else{
                    current_xml_section.start_tag(e); // this won't work
                }
                depth += 1;
            },
            Ok(Event::End(ref e)) => {
                depth -= 1;
                if state == State::InSection && state_at_depth == depth {
                    state = State::Unknown;
                    current_section = CurrentSection::Unknown;
                    state_at_depth = 0;
                    
                    current_xml_section.save(); // this won't work
                    // Free current_xml_section here
                }else{
                    if state == State::InSection {
                        current_xml_section.end_tag(e) // this won't work
                    }
                }
            },
            // unescape and decode the text event using the reader encoding
            Ok(Event::Text(e)) => (
                if state == State::InSection {
                    current_xml_section.text_data(e) // this won't work
                }
            ),
            Ok(Event::Eof) => break, // exits the loop when reaching end of file
            Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
            _ => (), // There are several other `Event`s we do not consider here
        }

        // if we don't keep a borrow elsewhere, we can clear the buffer to keep memory usage low
        buf.clear();
    }
}

所以我基本上不知道如何在主循环中保留对“当前”对象的引用(对不起，Python 术语)，因为:

我们可能有也可能没有我们正在处理的当前标签

该部分可能是对汽车或飞机的引用

我也考虑过:

使用 Serde，但它是一个庞大的文档，坦率地说我不知道它的整个结构(我是黑盒解码它)所以它需要以 block 的形式传递给 Serde(我没有设法做到这一点，即使我试过了)

保持对最新飞机、最新汽车的引用(并从在主循环之外创建空白对象开始)但感觉很丑

使用泛型

当我尝试解开我的大脑时，任何朝着正确方向的轻推都会受到欢迎!

最佳答案

XML 的事件驱动解析特别适用于范围驱动方法，其中每个级别都由不同的函数解析。
例如，您的主循环可能如下所示:

    loop {
        match reader.read_event(&mut buf) {
            Ok(Event::Start(ref e)) => {
                match e.name() {
                    b"car" => handle_car(&mut reader, &mut buf)?,
                    b"plane" => handle_plane(&mut reader, &mut buf)?,
                    _ => return Err("Unexpected Tag"),
                }
            },
            Ok(Event::Eof) => break,
            _ => (),
        }
    }

请注意，内部匹配语句只需要考虑可能出现在顶层的 XML 标记；任何其他标签都是意外的，应该会产生错误。handle_car看起来像这样:

fn handle_car(reader: &mut Reader<&[u8]>, buf:&mut Vec<u8>) -> Result<(),ErrType> {

    let mut car = Car::new();

    loop {
        match reader.read_event(buf) {
            Ok(Event::Start(ref e)) => {
                match e.name() {
                    b"name" =>  { 
                        car.name = handle_name(reader, buf)?;
                    },
                    b"brand" =>  {
                        car.brand = handle_brand(reader, buf)?;
                    },
                    _ => return Err("bad tag"),
                }
            },
            Ok(Event::End(ref e)) => break,
            Ok(Event::Eof) => return Err("Unexpected EOF"),
            _ => (),
        }
    }

    car.save();

    Ok(())
}

handle_car创建自己的 Car 实例，它位于该函数的范围内。它有自己的循环，它处理其中可能出现的所有标签。如果这些标签包含更多标签，您只需为它们引入一组新的处理函数。该函数返回 Result因此，如果输入结构与预期不匹配，则可以传递错误(由 quick_xml 产生的任何错误都可以，我已忽略但实际代码会处理)。
这种模式在解析 XML 时有一些优势:

代码的结构与 XML 的预期结构相匹配，使其更易于阅读和理解。

状态隐含在代码结构中。不需要状态变量或深度计数器。

出现在多个位置的通用标签(例如 <name> 和 <brand> 可以由重复使用的通用函数处理

如果您正在解析的 XML 格式具有嵌套结构(例如，如果 <car> 可能包含另一个 <car>)，则这由递归处理。

您最初的问题是不知道如何存储 Car/Plane完全避免在主循环内。

关于rust - 编写 Rust-y 代码 : Keeping references to different structs depending on type of object in XML，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65908064/

rust - 编写 Rust-y 代码 : Keeping references to different structs depending on type of object in XML

上一篇：generics - Rust:对通用参数的引用不满足特征绑定(bind)

下一篇：rust - Rust-调试和 Release模式的堆栈大小是否不同？