我正在尝试压缩算法,并遇到一个问题,即我的解码算法会在编码输入较短的情况下给出预期结果,但在达到一定的复杂程度后,它会开始返回垃圾。
通过一些调试步骤,我发现问题是由在“正常”变量中跟踪树遍历算法中的当前节点引起的:
auto currentNode = root;
更改此设置以使用指针跟踪树中的当前节点后,问题得到解决:
const TreeNode* currentNode = &root;
或者,替换:
currentNode = currentNode.children[bit];
与:
auto cache = currentNode.children[bit];
currentNode = cache;
也解决了这个问题。
我无法做的(即使有其他人的一些支持)是确定未定义行为的原因是什么。一切都表明这与此处的作业有关:
currentNode = currentNode.children[bit];
但这就是我们能找到的全部。
未定义行为背后的原因是什么?
代码:
#include <vector>
#include <string>
#include <iostream>
struct TreeNode {
char symbol; // Only relevant to leaf nodes
std::vector<TreeNode> children; // Leaf nodes have 0 children, all other nodes have exactly 2
TreeNode(unsigned char symbol) : symbol(symbol), children({}) { }
TreeNode(unsigned char symbol, TreeNode left, TreeNode right) : symbol(symbol), children({ left, right }) { }
};
/// <summary>
/// Decodes provided `input` data up to `size` using the tree rooted at `root` to decode
/// </summary>
/// <param name="input">Encoded data</param>
/// <param name="root">Root node of decoding tree</param>
/// <param name="size">Size of unencoded data</param>
/// <returns>Unencoded data</returns>
std::vector<unsigned char> DecodeWithVars(const std::vector<unsigned char>& input, const TreeNode& root, int size) {
std::vector<unsigned char> output = {};
auto currentNode = root;
for (auto& c : input) {
for (int i = 0; i <= 7; i++) {
int bit = (c >> (7 - i)) & 1; // Iterating over each bit of each character in `input`
currentNode = currentNode.children[bit];
if (currentNode.children.size() == 0) {
output.push_back(currentNode.symbol);
currentNode = root;
if (output.size() == size) {
return output;
}
}
}
}
return output;
}
/// <summary>
/// Decodes provided `input` data up to `size` using the tree rooted at `root` to decode
/// Different from DecodeWithVars in that it uses a pointer to keep track of current tree node
/// </summary>
/// <param name="input">Encoded data</param>
/// <param name="root">Root node of decoding tree</param>
/// <param name="size">Size of unencoded data</param>
/// <returns>Unencoded data</returns>
std::vector<unsigned char> DecodeWithPointers(const std::vector<unsigned char>& input, const TreeNode& root, int size) {
std::vector<unsigned char> output = {};
const TreeNode* currentNode = &root;
for (auto& c : input) {
for (int i = 0; i <= 7; i++) {
int bit = (c >> (7 - i)) & 1; // Iterating over each bit of each character in `input`
currentNode = &(*currentNode).children[bit];
if ((*currentNode).children.size() == 0) {
output.push_back((*currentNode).symbol);
currentNode = &root;
if (output.size() == size) {
return output;
}
}
}
}
return output;
}
int main()
{
std::string unencodedText = "AAAAAAAAAAAAAAABBBBBBBC,.,.,.,.,.,.CCCCCDDDDDDEEEEE";
std::vector<unsigned char> data = { 0,0,0,1,36,146,78,235,174,186,235,155,109,201,36,159,255,192 };
TreeNode tree = TreeNode('*',
TreeNode('*',
TreeNode('A'),
TreeNode('*',
TreeNode('B'),
TreeNode('C')
)
),
TreeNode('*',
TreeNode('*',
TreeNode('D'),
TreeNode(',')
),
TreeNode('*',
TreeNode('.'),
TreeNode('E')
)
)
);
auto decodedFromPointers = DecodeWithPointers(data, tree, unencodedText.size());
std::string strFromPointers(decodedFromPointers.begin(), decodedFromPointers.end());
auto decodedFromVars = DecodeWithVars(data, tree, unencodedText.size());
std::string strFromVars(decodedFromVars.begin(), decodedFromVars.end());
std::cout << strFromPointers << "\n";
std::cout << strFromVars << "\n";
return 0;
}
作为引用,tree
表示以下树:
使用 MSVC(适用于 x64 的 Microsoft (R) C/C++ 优化编译器版本 19.29.30138),我使用 C++17 或 C++20 获得以下输出:
AAAAAAAAAAAAAAABBBBBBBC,.,.,.,.,.,.CCCCCDDDDDDEEEEE AAAAAAAAAAAAAAA*ADDDDDDE*.,.,.,.,.,.*,,,,.*ADDDDEEE
GCC(C++20 在 coliru 上运行,您可能需要单击编辑才能运行)给出:
AAAAAAAAAAAAAAABBBBBBBC,.,.,.,.,.,.CCCCCDDDDDDEEEEE AAAAAAAAAAAAAAA�������C,.,.,.,.,.,.CCCCCDDDDDDEEEEE
Clang(C++17 也在 coliru 上)给出了相同的结果:
AAAAAAAAAAAAAAABBBBBBBC,.,.,.,.,.,.CCCCCDDDDDDEEEEE AAAAAAAAAAAAAAA�������C,.,.,.,.,.,.CCCCCDDDDDDEEEEE
最佳答案
线路
currentNode = currentNode.children[bit];
将为TreeNode
使用隐式定义的复制分配。此赋值运算符会将成员 currentNode.children[bit].children
复制赋值给 currentNode.children
。
然而,前者是后者元素的子对象。在给 vector 赋新值的过程中,它所属的元素将会被销毁。
我不太确定标准库是否需要使这样的分配起作用,但标准似乎只要求分配后分配的两侧比较相等,这在给定情况下是不可能的.
关于c++ - 当不使用指针来跟踪当前节点时,为什么树遍历会导致未定义的行为?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70920207/