html - Winsock recv 给出乱码和有用的 html

标签 html c++ network-programming winsock

我正在尝试使用以 C++ 实现的 winsock 获取网页 www.chemguide.co.uk 的 html 源代码(它的页面不长)。通过的大部分数据都很好,但在输出的某些点有一个特定的字符(它看起来像 |¦ 在控制台上和某种 I 在这里)被重复,我认为以 8 为一组,并且还有一些其他奇怪的字符。

此外,有些文档似乎是在页面结束后打印的(标签。这是代码:

// Portprog.cpp : Defines the entry point for the console application.
//


#include "stdafx.h"
#include <winsock2.h>
#include <sys/types.h>
#include <stdio.h>
#include <iostream>
#include <string>
#include <fstream>


#pragma comment(lib, "ws2_32.lib") //Winsock library

int getHTML(std::string *result)
{
    WSADATA wsa;
    SOCKET s;
    SOCKADDR_IN server;
    using std::string;
    using std::cout;
    using std::endl;

    cout << "Initialising Winsock...";
    if (WSAStartup(MAKEWORD(2, 2), &wsa) != 0)
    {
        cout << "Failed. Error Code: " << WSAGetLastError();
        return 1;
    }
    cout << "Winsock initialised." << endl;

    if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
    {
        cout << "Could not create socket: " << WSAGetLastError() << endl;
        return 1;
    }
    cout << "Socket created." << endl;

    server.sin_addr.s_addr = inet_addr("217.27.240.124");
    server.sin_family = AF_INET;
    server.sin_port = htons(80); //host to network endian short

    //Connect to remote server
    if (connect(s, (SOCKADDR *)&server, sizeof(server)) < 0)
    {
        cout << "Connection failed." << endl;
        return 1;
    }
    cout << "Connected." << endl;

    //Send some data
    string srequest = "GET / HTTP/1.1\r\n";
    srequest += "Host: chemguide.co.uk\r\n";
    srequest += "Connection: close\r\n";
    srequest += "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n";
    srequest += "\r\n";

    char crequest[10000];
    int requestSize = srequest.length() + 1;
    strncpy_s(crequest, srequest.c_str(), requestSize);

    if (send(s, crequest, requestSize, 0) < 0)
    {
        cout << "Data could not be sent." << endl;
        return 1;
    }
    cout << "Data sent." << endl;

    //Receive a reply from the server
    std::string server_reply = "";
    int recv_length;
    char buffer[1000];
    int i = 0;
    do
    {
        i = recv_length = recv(s, buffer, sizeof(buffer), 0);
        server_reply += buffer;
    } while (i != 0);
    cout << "Reply received." << endl;

    *result = server_reply;

    closesocket(s);
    WSACleanup();

    return 0;
}

int main(int argc, char *argv[])
{
    std::string response = "";
    getHTML(&response);

    std::cout << response << std::endl;
    std::ofstream file("output.txt");
    file << response;
    file.close();

    return 0;
}

这是输出:

HTTP/1.1 200 OK

Date: Mon, 03 Aug 2015 00:22:17 GMT

Server: Apache/2.2.11

Last-Modified: Mon, 13 Apr 2015 11:56:25 GMT

ETag: "99190a-1ec2-51399cdaacc40"

Accept-Ranges: bytes

Content-Length: 7874

Connection: close

Content-Type: text/html




<html>
<head>
<title>chemguide:  helping you to understand Chemistry - Main Menu</title>

<meta name="description"
content="Main menu of a site aimed to help advanced level chemistry students to understand chemistry" />
<meta name="keywords"
content="chemistry, A'level, a level, a'level, a-level, advanced level, advanced, help, understand, understanding, guide, guidebook" />


</head>

<body bgcolor="#ffffcc" link="blue" vlink="teal" alink="red">
<a name="top"></a>
<center>
<table align="center" border="0" width="480" cellspacing="10">

<tr>
<td colspan="2" bgcolor="#ccffcc" height="50" align="center" valign="middle">
<font color="#006600" size="7" face="Helvetica, Arial"><b>chemguide</b></font></td>
</tr>


<tr>
<td colspan="2">
<font colorÌÌÌÌÌÌÌÌè="#006600" size="6" face="Helvetica, Arial"><p align="center"><b>Helping you to understand Chemistry</b></p></font>

<font color="#000000" size="5" face="Helvetica, Arial">
<p align="center"><b>MAIN MENU</b></p>
</font>

<pre>

</pre>
<table align="center" cellpadding="10" border="1">
<tr valign="top"><td bgcolor="#cccccc"> <font color="#ff0000" face="Helvetica, Arial" size="2"><b>New!  </b></a></font><font color="#000000" face="Helvetica, Arial" size="2">stry" />
<meta name="keywords"
content="chemistry, A'level, a level, a'level, a-level, advanced level, advanced, help, understand, understanding, guide, guidebook" />


</head>

<body bgcolor="#ffffcc" link="blue" vlink="teal" alink="red">
<a name="top"></a>
<center>
<table align="center" border="0" width="480" cellspacing="10">

<tr>
<td colspan="2" bgcolor="#ccffcc" height="50" align="center" valign="middle">
<font color="#006600" size="7" face="Helvetica, Arial"><b>chemguide</b></font></td>
</tr>


<tr>
<td colspan="2">
<font colorÌÌÌÌÌÌÌÌÌI have just come across a really good site of short chemistry revision videos.  You can find more about it at the top of the <a href="links.html#top"></font>links</a> page.</td></tr>
</table>
<pre>

</pre>
<table align="center" cellpadding="10" border="1">


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="keywordsearch.html#top"><b>Keyword searching</b></a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">I have removed the Google search box because it was giving problems.  Follow this link to find out how you can still search Chemguide using keywords.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="igcse/index.html"><b>Edexcel Chemistry book</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for my Edexcel International GCSE Chemistry book. This will soon be retitled as Edexcel International GCSE Chemistry, Edexcel Certificate inÌÌÌÌÌÌÌÌè Chemistry.</b></font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="http://www.chemguideforcie.co.uk/index.html"><b>CIE syllabus support</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for CIE (Cambridge International) A level students and teachers.</b></font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="atommze="2">I have removed the Google search box because it was giving problems.  Follow this link to find out how you can still search Chemguide using keywords.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="igcse/index.html"><b>Edexcel Chemistry book</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for my Edexcel International GCSE Chemistry book. This will soon be retitled as Edexcel International GCSE Chemistry, Edexcel Certificate inÌÌÌÌÌÌÌÌÌenu.html#top">Atomic Structure and Bonding</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers basic atomic properties (electronic structures, ionisation energies, electron affinities, atomic and ionic radii, and the atomic hydrogen emission spectrum), bonding (including intermolecular bonding) and structures (ionic, molecular, giant covalent and metallic).</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="inorgmenu.html#top">Inorganic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes essential ideas about redox reactions, and covers the trends in Period 3 and Groups 1, 2, 4 and 7 of the Periodic Table.  Plus: lengthy sections on the chemistry of some important complex ions, and of common transition metals.  Extraction and uses of aluminium, copper, iron, titanium and tungsten.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" sÌÌÌÌÌÌÌÌèize="2"><a href="physmenu.html#top">Physical Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers simple kinetic theory, ideal and real gases, chemical energetics, rates of reaction including catalysis, an introduction to chemical equilibria, redox equilibria, acid-base equilibria (pH, buffer solutions, indicators, etc), solubility products, and phase equilibria (including Raoult's Law and the use of various phase diagetica, Arial" size="2"><a href="inorgmenu.html#top">Inorganic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes essential ideas about redox reactions, and covers the trends in Period 3 and Groups 1, 2, 4 and 7 of the Periodic Table.  Plus: lengthy sections on the chemistry of some important complex ions, and of common transition metals.  Extraction and uses of aluminium, copper, iron, titanium and tungsten.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" sÌÌÌÌÌÌÌÌÌrams).</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="analysismenu.html#top">Instrumental analysis</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Explains how you can analyse substances using machines - mass spectrometry,  infra-red spectroscopy, NMR, UV-visible absorption spectrometry and chromatography.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="orgmenu.html#top">Basic Organic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes help on bonding, naming and isomerism, and a discussion of organic acids and bases.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="orgpropsmenu.html#top">Properties of organic compounds</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers the physical and chemical properties of compounds on UK A ÌÌÌÌÌÌÌÌèlevel chemistry syllabuses, and includes a limited amount of biochemistry.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="mechmenu.html#top">Organic Reaction Mechanisms</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers all the mechanisms required by the current UK A level chemistry syllabuses.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="about.html#top">About this site</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes a contact address if you have found any difficulties with the site.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="qandclist.html#top">Questions and comments</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A selection of questions that I have been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌèts.  There are also a number of chemistry questions that I have been asked and which I haven't been able to find good answers for!</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="book.html#top">Chemistry Calculations</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A description of the author's book on calculations at UK A level chemistry standard.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="suggestions.html#top">Textbook suggestions</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Suggestions for textbooks and revision guides covering the UK AS and A level chemistry syllabuses, with links to Amazon.co.uk if you want to follow them up.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌ̘es.html#top">Download syllabuses</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">For UK students and international students using UK exams (e.g. Cambridge International).  Download a copy of your current syllabus from your examiners.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="links.html">Links</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A random collection of links to sites that I have found interesting or useful.  You will find it is a fairly quirky collection - that's deliberate.</font></td></tr>
</table>

<pre>

</pre>
<hr />

<p><font color="#000000" size="2" face="Helvetica, Arial"> &copy; Jim Clark 2009 (last modified September 2013)</font></p>
</td>
</tr>

</table></center>
</BODY>
</HTML>
tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌ6es.html#top">Download syllabuses</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">For UK students and international students using UK exams (e.g. Cambridge International).  Download a copy of your current syllabus from your examiners.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="links.html">Links</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A random collection of links to sites that I have found interesting or useful.  You will find it is a fairly quirky collection - that's deliberate.</font></td></tr>
</table>

<pre>

</pre>
<hr />

<p><font color="#000000" size="2" face="Helvetica, Arial"> &copy; Jim Clark 2009 (last modified September 2013)</font></p>
</td>
</tr>

</table></center>
</BODY>
</HTML>
tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌ

我使用的是 Visual Studio 2013。这是我的 stdafx.h 文件:

// stdafx.h : include file for standard system include files,
// or project specific include files that are used frequently, but
// are changed infrequently
//

#pragma once

#define _WINSOCK_DEPRECATED_NO_WARNINGS
//#define _CRT_SECURE_NO_WARNINGS

#include "targetver.h"

#include <stdio.h>
#include <tchar.h>



// TODO: reference additional headers your program requires here

最佳答案

问题是您将读取的数据视为字符串,但您似乎忘记了 C++ 中的 C 风格字符串以特殊字符 '\0' 结束。

所以你需要读取比缓冲区大小少一个字符,并通过在末尾添加终止符来终止你作为字符串读取的缓冲区:

if (i >= 0)
    buffer[i] = '\0';

出现乱码的原因是当您将缓冲区附加到字符串 server_reply 时,+= 运算符函数会查找此终止符以找到结束符要附加的字符串,如果终止符 += 运算符函数将继续执行,直到找到与终止符对应的字节,这甚至可能超出 buffer 的限制.不终止字符串会导致 undefined behavior .


此外,您在接收时不检查错误,如果 recv 返回 SOCKET_ERROR(不等于零),您认为会发生什么情况?您将以无限循环结束。

关于html - Winsock recv 给出乱码和有用的 html,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31777941/

相关文章:

javascript - 使用 Javascript 读取文件

c++ - 如何将 lua 脚本转换为 lua 字节码?

windows - 如何加速 Windows XP 应用程序中的 UDP 通信

c++ - 使用带有回调的可变参数模板

java - Servlet - 获取客户端公共(public)IP

python - 如何快速检查 SimpleXMLRPCServer 服务器是否在线?

html - 3列div布局的问题

html - 以动态高度的页面响应 block 为中心

javascript - 基于最长宽度字符串的 CSS/HTML 居中段落内容

c++ - 防止模板偏特化错误