我正在尝试从二进制文件读取到char数组。当打印数组条目时,将打印任意数字(换行符)和所需的数字。我真的无法解决这个问题。
文件的前几个字节是:
00 00 08 03 00 00 EA 60 00 00 00 1C 00 00 00 1C 00 00
我的代码:
void MNISTreader::loadImagesAndLabelsToMemory(std::string imagesPath,
std::string labelsPath) {
std::ifstream is(imagesPath.c_str());
char *data = new char[12];
is.read(data, 12);
std::cout << std::hex << (int)data[2] << std::endl;
delete [] data;
is.close();
}
例如,它打印:
ffffff9b
8
8是正确的。前面的数字在执行之间变化。换行符从哪里来?
最佳答案
您询问要从二进制文件读取数据并将其保存到char[]
中,并向我们展示了您针对问题提交的以下代码:
void MNISTreader::loadImagesAndLabelsToMemory(std::string imagesPath, std::string labelsPath) { std::ifstream is(imagesPath.c_str()); char *data = new char[12]; is.read(data, 12); std::cout << std::hex << (int)data[2] << std::endl; delete [] data; is.close(); }
您想知道:
The preceding number changes from execution to execution. And where does this newline come from?
在实际回答该问题之前,您需要了解二进制文件。那就是内部文件的结构。从二进制文件读取数据时,必须记住某些程序已将数据写入该文件,并且该数据是以结构化格式写入的。对于每个家族或二进制文件类型来说,唯一重要的就是这种格式。大多数二进制文件通常会遵循一种通用模式,即它们将先存储一个
header
,然后甚至是sub headers
,然后是簇,数据包或大块等,甚至在 header 之后放置原始数据,而有些二进制文件可能只是纯原始数据。您必须知道文件在内存中的结构。char = 1 byte
,int = 4 bytes (32bit system) 8 bytes (64bit system)
,float = 4bytes
,double = 8bytes
等。根据您的代码,您有一个
array
的char
,其大小为12
,并且知道内存中的char
是1 byte
,因此您需要12 bytes
。现在的问题是,您要连续提取12个连续的单个字节,并且不知道文件结构,如何确定第一个字节是实际写入的char
还是unsigned char
或int
?考虑由
C++ structs
创建的这两种不同的二进制文件结构,该结构包含所有需要的data
,并且都以二进制格式写出到文件中。两个文件结构都将使用的通用头结构。
struct Header {
// Size of Header
std::string filepath;
std::string filename;
unsigned int pathSize;
unsigned int filenameSize;
unsigned int headerSize;
unsigned int dataSizeInBytes;
};
FileA 文件A的唯一结构
struct DataA {
float width;
float length;
float height;
float dummy;
}
FileB 文件B的唯一结构
struct DataB {
double length;
double width;
}
内存中的文件通常如下所示:
用于文件路径和文件名。
所以我们知道在32位系统上它将是4字节x 4 = 16总字节
让我们考虑两个不同的二进制文件,在这些文件中,我们已经经过了所有 header 信息,并且正在读取要解析的字节。我们得到了以字节为单位的数据大小,对于
FileA
我们有4 floats = 16bytes
,对于FileB
我们有2 doubles = 16bytes
。因此,现在我们知道了如何调用该方法以读取x
类型的数据。既然y
现在是y
,而type
就是我们可以说的话:x
好像y(x)
是内置类型,而y
是此构造类型的默认内置类型构造函数的数字初始值设定项,它可以是x
。 int
,float
,double
等。现在让我们说我们正在读取这两个文件中的一个,但是不知道数据结构及其以前的信息是如何存储到文件中的,并且通过 header 看到数据大小在内存中为
char
,但我们没有不知道它是被存储为16 bytes
还是4 floats = 16 bytes
。两种结构均为16字节,但具有不同数量的不同数据类型。总而言之,在不知道文件的数据结构和如何解析二进制文件的情况下,确实会成为
2 doubles = 16 bytes
现在,假设您确实知道文件结构,可以尝试从上面回答问题,您可以尝试这个小程序并检查一些结果:
#include <string>
#include <iostream>
int main() {
// Using Two Strings
std::string imagesPath("ImagesPath\\");
std::string labelsPath("LabelsPath\\");
// Concat of Two Strings
std::string full = imagesPath + labelsPath;
// Display Of Both
std::cout << full << std::endl;
// Data Type Pointers
char* cData = nullptr;
cData = new char[12];
unsigned char* ucData = nullptr;
ucData = new unsigned char[12];
// Loop To Set Both Pointers To The String
unsigned n = 0;
for (; n < 12; ++n) {
cData[n] = full.at(n);
ucData[n] = full.at(n);
}
// Display Of Both Strings By Character and Unsigned Character
n = 0;
for (; n < 12; ++n) {
std::cout << cData[n];
}
std::cout << std::endl;
n = 0;
for (; n < 12; ++n) {
std::cout << ucData[n];
}
std::cout << std::endl;
// Both Yeilds Same Result
// Okay lets clear out the memory of these pointers and then reuse them.
delete[] cData;
delete[] ucData;
cData = nullptr;
ucData = nullptr;
// Create Two Data Structurs 1 For Each Different File
struct A {
float length;
float width;
float height;
float padding;
};
struct B {
double length;
double width;
};
// Constants For Our Data Structure Sizes
const unsigned sizeOfA = sizeof(A);
const unsigned sizeOfB = sizeof(B);
// Create And Populate An Instance Of Each
A a;
a.length = 3.0f;
a.width = 3.0f;
a.height = 3.0f;
a.padding = 0.0f;
B b;
b.length = 5.0;
b.width = 5.0;
// Lets First Use The `Char[]` Method for each struct and print them
// but we need 16 bytes instead of `12` from your problem
char *aData = nullptr; // FileA
char *bData = nullptr; // FileB
aData = new char[16];
bData = new char[16];
// Since A has 4 floats we know that each float is 4 and 16 / 4 = 4
aData[0] = a.length;
aData[4] = a.width;
aData[8] = a.height;
aData[12] = a.padding;
// Print Out Result but by individual bytes without casting for A
// Don't worry about the compiler warnings and build and run with the
// warning and compare the differences in what is shown on the screen
// between A & B.
n = 0;
for (; n < 16; ++n) {
std::cout << aData[n] << " ";
}
std::cout << std::endl;
// Since B has 2 doubles weknow that each double is 8 and 16 / 8 = 2
bData[0] = b.length;
bData[8] = b.width;
// Print out Result but by individual bytes without casting for B
n = 0;
for (; n < 16; ++n) {
std::cout << bData[n] << " ";
}
std::cout << std::endl;
// Let's Print Out Both Again But By Casting To Their Approriate Types
n = 0;
for (; n < 4; ++n) {
std::cout << reinterpret_cast<float*>(aData[n]) << " ";
}
std::cout << std::endl;
n = 0;
for (; n < 2; ++n) {
std::cout << reinterpret_cast<double*>(bData[n]) << " ";
}
std::cout << std::endl;
// Clean Up Memory
delete[] aData;
delete[] bData;
aData = nullptr;
bData = nullptr;
// Even By Knowing The Appropriate Sizes We Can See A Difference
// In The Stored Data Types. We Can Now Do The Same As Above
// But With Unsigned Char & See If It Makes A Difference.
unsigned char *ucAData = nullptr;
unsigned char *ucBData = nullptr;
ucAData = new unsigned char[16];
ucBData = new unsigned char[16];
// Since A has 4 floats we know that each float is 4 and 16 / 4 = 4
ucAData[0] = a.length;
ucAData[4] = a.width;
ucAData[8] = a.height;
ucAData[12] = a.padding;
// Print Out Result but by individual bytes without casting for A
// Don't worry about the compiler warnings and build and run with the
// warning and compare the differences in what is shown on the screen
// between A & B.
n = 0;
for (; n < 16; ++n) {
std::cout << ucAData[n] << " ";
}
std::cout << std::endl;
// Since B has 2 doubles weknow that each double is 8 and 16 / 8 = 2
ucBData[0] = b.length;
ucBData[8] = b.width;
// Print out Result but by individual bytes without casting for B
n = 0;
for (; n < 16; ++n) {
std::cout << ucBData[n] << " ";
}
std::cout << std::endl;
// Let's Print Out Both Again But By Casting To Their Approriate Types
n = 0;
for (; n < 4; ++n) {
std::cout << reinterpret_cast<float*>(ucAData[n]) << " ";
}
std::cout << std::endl;
n = 0;
for (; n < 2; ++n) {
std::cout << reinterpret_cast<double*>(ucBData[n]) << " ";
}
std::cout << std::endl;
// Clean Up Memory
delete[] ucAData;
delete[] ucBData;
ucAData = nullptr;
ucBData = nullptr;
// So Even Changing From `char` to an `unsigned char` doesn't help here even
// with reinterpret casting. Because These 2 Files Are Different From One Another.
// They have a unique signature. Now a family of files where a specific application
// saves its data to a binary will all follow the same structure. Without knowing
// the structure of the binary file and knowing how much data to pull in and the big key
// word here is `what type` of data you are reading in and by how much. This becomes an (X/Y) Problem.
// This is the hard part about parsing binaries, you need to know the file structure.
char c = ' ';
std::cin.get(c);
return 0;
}
运行上面的简短程序后,不必担心屏幕上显示的每个值是多少;只需查看用于比较两个不同文件结构的模式即可。这只是表明
X/Y Problem
宽的struct of floats
与也是16 bytes
宽的struct of doubles
不同。因此,当我们回到您的问题并且您正在阅读16 bytes
时,问题就变成了这些第一个12 individual consecutive bytes
代表什么?如果在32位计算机上是12 bytes
或3 ints
,在64位计算机上是3 unsigned ints
或2 ints
,还是2 unsigned ints
,还是3 floats
和2 doubles
这样的组合?您正在读取的二进制文件的当前数据结构是什么?编辑在我编写的小程序中;我确实忘记尝试或在打印输出语句中添加
1 float
,也可以在每次使用索引指针的打印中添加它们,但是没有必要这样做,因为输出到显示的是完全相同的东西因为这仅在视觉上显示或表达了内存中两个数据结构的差异以及它们的模式。
关于c++ - 从二进制文件读取到数组:在任意数字之前,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43970850/