c++ - 当不使用指针来跟踪当前节点时,为什么树遍历会导致未定义的行为?

标签 c++ undefined-behavior tree-traversal

我正在尝试压缩算法,并遇到一个问题,即我的解码算法会在编码输入较短的情况下给出预期结果,但在达到一定的复杂程度后,它会开始返回垃圾。

通过一些调试步骤,我发现问题是由在“正常”变量中跟踪树遍历算法中的当前节点引起的:

auto currentNode = root;

更改此设置以使用指针跟踪树中的当前节点后,问题得到解决:

const TreeNode* currentNode = &root;

或者,替换:

currentNode = currentNode.children[bit];

与:

auto cache = currentNode.children[bit];
currentNode = cache;

也解决了这个问题。

我无法做的(即使有其他人的一些支持)是确定未定义行为的原因是什么。一切都表明这与此处的作业有关:

currentNode = currentNode.children[bit];

但这就是我们能找到的全部。

未定义行为背后的原因是什么?


代码:

#include <vector>
#include <string>
#include <iostream>

struct TreeNode {
    char symbol; // Only relevant to leaf nodes
    std::vector<TreeNode> children; // Leaf nodes have 0 children, all other nodes have exactly 2

    TreeNode(unsigned char symbol) : symbol(symbol), children({}) { }
    TreeNode(unsigned char symbol, TreeNode left, TreeNode right) : symbol(symbol), children({ left, right }) { }
};

/// <summary>
/// Decodes provided `input` data up to `size` using the tree rooted at `root` to decode
/// </summary>
/// <param name="input">Encoded data</param>
/// <param name="root">Root node of decoding tree</param>
/// <param name="size">Size of unencoded data</param>
/// <returns>Unencoded data</returns>
std::vector<unsigned char> DecodeWithVars(const std::vector<unsigned char>& input, const TreeNode& root, int size) {
    std::vector<unsigned char> output = {};
    auto currentNode = root;

    for (auto& c : input) {
        for (int i = 0; i <= 7; i++) {
            int bit = (c >> (7 - i)) & 1;   // Iterating over each bit of each character in `input`

            currentNode = currentNode.children[bit];
            if (currentNode.children.size() == 0) {
                output.push_back(currentNode.symbol);
                currentNode = root;
                if (output.size() == size) {
                    return output;
                }
            }
        }
    }
    return output;
}

/// <summary>
/// Decodes provided `input` data up to `size` using the tree rooted at `root` to decode
/// Different from DecodeWithVars in that it uses a pointer to keep track of current tree node
/// </summary>
/// <param name="input">Encoded data</param>
/// <param name="root">Root node of decoding tree</param>
/// <param name="size">Size of unencoded data</param>
/// <returns>Unencoded data</returns>
std::vector<unsigned char> DecodeWithPointers(const std::vector<unsigned char>& input, const TreeNode& root, int size) {
    std::vector<unsigned char> output = {};
    const TreeNode* currentNode = &root;

    for (auto& c : input) {
        for (int i = 0; i <= 7; i++) {
            int bit = (c >> (7 - i)) & 1;   // Iterating over each bit of each character in `input`

            currentNode = &(*currentNode).children[bit];
            if ((*currentNode).children.size() == 0) {
                output.push_back((*currentNode).symbol);
                currentNode = &root;
                if (output.size() == size) {
                    return output;
                }
            }
        }
    }
    return output;
}


int main()
{
    std::string unencodedText = "AAAAAAAAAAAAAAABBBBBBBC,.,.,.,.,.,.CCCCCDDDDDDEEEEE";
    std::vector<unsigned char> data = { 0,0,0,1,36,146,78,235,174,186,235,155,109,201,36,159,255,192 };

    TreeNode tree = TreeNode('*',
                             TreeNode('*',
                                      TreeNode('A'),
                                      TreeNode('*',
                                               TreeNode('B'),
                                               TreeNode('C')
                                      )
                             ),
                             TreeNode('*',
                                      TreeNode('*',
                                               TreeNode('D'),
                                               TreeNode(',')
                                      ),
                                      TreeNode('*',
                                               TreeNode('.'),
                                               TreeNode('E')
                                      )
                             )
    );

    auto decodedFromPointers = DecodeWithPointers(data, tree, unencodedText.size());
    std::string strFromPointers(decodedFromPointers.begin(), decodedFromPointers.end());

    auto decodedFromVars = DecodeWithVars(data, tree, unencodedText.size());
    std::string strFromVars(decodedFromVars.begin(), decodedFromVars.end());


    std::cout << strFromPointers << "\n";
    std::cout << strFromVars << "\n";
    return 0;
}

作为引用,tree 表示以下树:

enter image description here

使用 MSVC(适用于 x64 的 Microsoft (R) C/C++ 优化编译器版本 19.29.30138),我使用 C++17 或 C++20 获得以下输出:

AAAAAAAAAAAAAAABBBBBBBC,.,.,.,.,.,.CCCCCDDDDDDEEEEE
AAAAAAAAAAAAAAA*ADDDDDDE*.,.,.,.,.,.*,,,,.*ADDDDEEE

GCC(C++20 在 coliru 上运行,您可能需要单击编辑才能运行)给出:

AAAAAAAAAAAAAAABBBBBBBC,.,.,.,.,.,.CCCCCDDDDDDEEEEE
AAAAAAAAAAAAAAA�������C,.,.,.,.,.,.CCCCCDDDDDDEEEEE

Clang(C++17 也在 coliru 上)给出了相同的结果:

AAAAAAAAAAAAAAABBBBBBBC,.,.,.,.,.,.CCCCCDDDDDDEEEEE
AAAAAAAAAAAAAAA�������C,.,.,.,.,.,.CCCCCDDDDDDEEEEE

最佳答案

线路

currentNode = currentNode.children[bit];

将为TreeNode使用隐式定义的复制分配。此赋值运算符会将成员 currentNode.children[bit].children 复制赋值给 currentNode.children

然而,前者是后者元素的子对象。在给 vector 赋新值的过程中,它所属的元素将会被销毁。

我不太确定标准库是否需要使这样的分配起作用,但标准似乎只要求分配后分配的两侧比较相等,这在给定情况下是不可能的.

关于c++ - 当不使用指针来跟踪当前节点时,为什么树遍历会导致未定义的行为?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70920207/

相关文章:

C++ delete - 它删除了我的对象,但我仍然可以访问数据?

c - C 中后置和前置增量运算符的奇怪行为

java - jackson json : traversing a json tree node by node

c++ - 霍夫曼码编码遍历

c++ - 将 Python 翻译成 C++ : lists and tuples

c++ - 将 int 重新解释为 float 的最有效的标准兼容方式

c - 与指针比较数组末尾后的一个元素是否定义明确?

c++在while循环中获得一次输出

c++ - 如何检查 CMakeLists.txt 文件中是否定义了某些内容

java - 在 Java 类方法中嵌入 JavaScript 片段