c++ - 套接字缓冲区中烦人的 NUL block

我目前正在尝试在 Mac 上编写 C++ 代码以从网站下载更大的文件 (~1GB)。我想我在将套接字缓冲区转换为字符串的某个地方出错了，导致我生成的文件(电影文件)有一些小块的 nul 字符散布在整个文件中，我需要以某种方式从 optained 的字符串中删除它们套接字缓冲区。

这是处理 http 连接的部分和将日期保存到文件的部分。某些部分可能不在此示例中，例如错误处理或完整的套接字构建。

//I have error handling in here but stripped out from this example
char buffer[512];
portno = atoi("8080");
sockfd = socket(AF_INET, SOCK_STREAM, 0);
server = gethostbyname(address);

bzero((char *) &serv_addr, sizeof(serv_addr));

serv_addr.sin_family = AF_INET;

bcopy((char *)server->h_addr,
      (char *)&serv_addr.sin_addr.s_addr,
      server->h_length);

serv_addr.sin_port = htons(portno);

bzero(buffer,512);
header.copy(buffer,512);

n = write(sockfd,buffer,strlen(buffer));

std::string str_buff;

while((n = read(sockfd,buffer,511)) > 0){

    std::string temp(buffer,511);
    //Is this the error^^^^^^^^^?

    write_chunk_to_file(temp);
    //cut



void write_chunk_to_file(std::string chunk){
   write.open(path+fname, std::ios::out | std::ios::app);

   write << remove_header(chunk);

   write.close();
   //cut




std::string remove_header(std::string chunk){

   if(chunk.find("")){
       chunk = chunk.substr(chunk.find(""),chunk.length());
   }

   return chunk;

}

当我将我的代码下载的文件与 wget 下载的文件进行比较时，我的文件中有一些仅由 NUL 字符组成的较小块，并且我的文件中似乎也存在一些额外的字节。

有人知道吗？

最佳答案

是的，您指出的行是错误的:

std::string temp(buffer,511);
//Is this the error^^^^^^^^^?

read() 返回实际读入缓冲区的字节数。您需要考虑到这一点:

std::string temp(buffer,n);

此外，您正在读取原始数据，因此 remove_header() 不属于 write_chunk_to_file()。缓冲区可能同时包含多个 header 和/或主体的数据部分。您需要实现适当的 HTTP 解析器，以便检测每个 header 的结束位置、正文的开始位置、正文的结束位置以及正文的编码方式。然后您可以只将正文数据写入您的文件。

这段代码距离正确读取 HTTP 响应还差得很远。您需要更像这样实现逻辑(我将其留作练习，供您在 C++ 中实现):

send request
while true:
    read line
    if not successful:
        throw error
    if line is blank:
        break while loop
    add line to headers list
parse headers list
if response can contain message body:
    if HTTP version is 1.1+, and Transfer-Encoding header is present and not "identity":
        while true:
            read line, extract delimited ASCII hexadecimal for the chunk size
            if not successful:
                throw error
             if chunk size is 0:
                break while loop
             read chunk size number of bytes
        while true:
            read line
            if not successful:
                throw error
            if line is blank:
                break while loop
            add line to headers list, replace existing header if needed
        parse headers list again
    else if Content-Length header is specified:
        read Content-Length number of bytes
    else if Content-Type header is "multipart/byteranges":
        read and parse MIME-encoded chunks until terminating MIME boundary is reached
    else:
        read until connection is closed

关于c++ - 套接字缓冲区中烦人的 NUL block ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28665141/

c++ - 套接字缓冲区中烦人的 NUL block

上一篇：C++ 内存不足异常测试

下一篇：c++ - 如何安全、明智地确定指针是否指向指定缓冲区的某处？