java - 如何存储可能包含二进制数据的 Http Response?

标签 java http encoding httpresponse

正如我在 a previous question 中所描述的那样,我有一个任务要写一个代理服务器。它现在部分工作,但我在处理 gzip 信息时仍然有问题。我将 HttpResponse 存储在一个字符串中,看来我无法使用 gzip 压缩的内容来做到这一点。但是, header 是我需要解析的文本,它们都来自同一个 InputStream。我的问题是,我必须做什么才能正确处理二进制响应,同时仍将 header 解析为字符串?

>> 在查看代码之前,请先查看下面的编辑。

这是 Response 类的实现:

public class Response {
    private String fullResponse = "";
    private BufferedReader reader;
    private boolean busy = true;
    private int responseCode;
    private CacheControl cacheControl;

    public Response(String input) {
        this(new ByteArrayInputStream(input.getBytes()));
    }

    public Response(InputStream input) {
        reader = new BufferedReader(new InputStreamReader(input));
        try {
            while (!reader.ready());//wait for initialization.

            String line;
            while ((line = reader.readLine()) != null) {
                fullResponse += "\r\n" + line;

                if (HttpPatterns.RESPONSE_CODE.matches(line)) {
                    responseCode = (Integer) HttpPatterns.RESPONSE_CODE.process(line);
                } else if (HttpPatterns.CACHE_CONTROL.matches(line)) {
                    cacheControl = (CacheControl) HttpPatterns.CACHE_CONTROL.process(line);
                }
            }
            reader.close();
            fullResponse = "\r\n" + fullResponse.trim() + "\r\n\r\n";
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
        busy = false;
    }

    public CacheControl getCacheControl() {
        return cacheControl;
    }

    public String getFullResponse() {
        return fullResponse;
    }

    public boolean isBusy() {
        return busy;
    }

    public int getResponseCode() {
        return responseCode;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result
                + ((fullResponse == null) ? 0 : fullResponse.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (!(obj instanceof Response))
            return false;
        Response other = (Response) obj;
        if (fullResponse == null) {
            if (other.fullResponse != null)
                return false;
        } else if (!fullResponse.equals(other.fullResponse))
            return false;
        return true;
    }

    @Override
    public String toString() {
        return "Response\n==============================\n" + fullResponse;
    }
}

这里是HttpPatterns:

public enum HttpPatterns {
    RESPONSE_CODE("^HTTP/1\\.1 (\\d+) .*$"),
    CACHE_CONTROL("^Cache-Control: (\\w+)$"),
    HOST("^Host: (\\w+)$"),
    REQUEST_HEADER("(GET|POST) ([^\\s]+) ([^\\s]+)$"),
    ACCEPT_ENCODING("^Accept-Encoding: .*$");

    private final Pattern pattern;

    HttpPatterns(String regex) {
        pattern = Pattern.compile(regex);
    }

    public boolean matches(String expression) {
        return pattern.matcher(expression).matches();
    }

    public Object process(String expression) {
        Matcher matcher = pattern.matcher(expression);
        if (!matcher.matches()) {
            throw new RuntimeException("Called `process`, but the expression doesn't match. Call `matches` first.");
        }

        if (this == RESPONSE_CODE) {
            return Integer.parseInt(matcher.group(1));
        } else if (this == CACHE_CONTROL) {
            return CacheControl.parseString(matcher.group(1));
        } else if (this == HOST) {
            return matcher.group(1);
        } else if (this == REQUEST_HEADER) {
            return new RequestHeader(RequestType.parseString(matcher.group(1)), matcher.group(2), matcher.group(3));
        } else { //never happens
            return null;
        }
    }


}

编辑

我尝试根据建议实现,但它不起作用,我变得绝望了。当我尝试查看图像时,我从浏览器收到以下消息:

The image “http://www.google.com/images/logos/ps_logo2.png” cannot be displayed because it contains errors.

这是日志:

Request
==============================

GET http://www.google.com/images/logos/ps_logo2.png HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Cookie: PREF=ID=31f95dd7f42dfc7d:TM=1303507626:LM=1303507626:S=D4kIZ6rGFrlOUWlm


Not Reading from the Cache!!!!
I am going to try to connect to: www.google.com at port 80
Connected.
Writing to the server's buffer...
flushed.
Getting a response...
Got a binary response!


contentLength = 26209; headers.length() = 312; responseLength = 12136; fullResponse length = 12136


Got a response!

Writing to the Cache!!!!
I am going to write the following response:

HTTP/1.1 200 OK
Content-Type: image/png
Last-Modified: Thu, 05 Aug 2010 22:54:44 GMT
Date: Wed, 04 May 2011 15:05:30 GMT
Expires: Wed, 04 May 2011 15:05:30 GMT
Cache-Control: private, max-age=31536000
X-Content-Type-Options: nosniff
Server: sffe
Content-Length: 26209
X-XSS-Protection: 1; mode=block

 Response body is binary and was truncated.
Finished with request!

这是新的 Response 类:

public class Response {
    private String headers = "";
    private BufferedReader reader;
    private boolean busy = true;
    private int responseCode;
    private CacheControl cacheControl;
    private InputStream fullResponse;
    private ContentEncoding encoding = ContentEncoding.TEXT;
    private ContentType contentType = ContentType.TEXT;
    private int contentLength;

    public Response(String input) {
        this(new ByteArrayInputStream(input.getBytes()));
    }

    public Response(InputStream input) {

        ByteArrayOutputStream tempStream = new ByteArrayOutputStream();
        InputStreamReader inputReader = new InputStreamReader(input);
        try {
            while (!inputReader.ready());
            int responseLength = 0;
            while (inputReader.ready()) {
                tempStream.write(inputReader.read());
                responseLength++;
            }
            /*
             * Read the headers
             */
            reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(tempStream.toByteArray())));
            while (!reader.ready());//wait for initialization.

            String line;
            while ((line = reader.readLine()) != null) {
                headers += "\r\n" + line;

                if (HttpPatterns.RESPONSE_CODE.matches(line)) {
                    responseCode = (Integer) HttpPatterns.RESPONSE_CODE.process(line);
                } else if (HttpPatterns.CACHE_CONTROL.matches(line)) {
                    cacheControl = (CacheControl) HttpPatterns.CACHE_CONTROL.process(line);
                } else if (HttpPatterns.CONTENT_ENCODING.matches(line)) {
                    encoding = (ContentEncoding) HttpPatterns.CONTENT_ENCODING.process(line);
                } else if (HttpPatterns.CONTENT_TYPE.matches(line)) {
                    contentType = (ContentType) HttpPatterns.CONTENT_TYPE.process(line);
                } else if (HttpPatterns.CONTENT_LENGTH.matches(line)) {
                    contentLength = (Integer) HttpPatterns.CONTENT_LENGTH.process(line);
                } else if (line.isEmpty()) {
                    break;
                }
            }

            InputStreamReader streamReader = new InputStreamReader(new ByteArrayInputStream(tempStream.toByteArray()));
            while (!reader.ready());//wait for initialization.
            //Now let's get the rest
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            int counter = 0;
            while (streamReader.ready() && counter < (responseLength - contentLength)) {
                out.write((char) streamReader.read());
                counter++;
            }
            if (encoding == ContentEncoding.BINARY || contentType == ContentType.BINARY) {
                System.out.println("Got a binary response!");
                while (streamReader.ready()) {
                    out.write(streamReader.read());
                }
            } else {
                System.out.println("Got a text response!");
                while (streamReader.ready()) {
                    out.write((char) streamReader.read());
                }
            }
            fullResponse = new ByteArrayInputStream(out.toByteArray());

            System.out.println("\n\ncontentLength = " + contentLength + 
                    "; headers.length() = " + headers.length() + 
                    "; responseLength = " + responseLength + 
                    "; fullResponse length = " + out.toByteArray().length + "\n\n");

        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
        busy = false;
    }

这是 ProxyServer 类:

class ProxyServer {
    public void start() {
        while (true) {
            Socket serverSocket;
            Socket clientSocket;
            OutputStreamWriter toClient;
            BufferedWriter toServer;
            try {
                //The client is meant to put data on the port, read the socket.
                clientSocket = listeningSocket.accept();
                Request request = new Request(clientSocket.getInputStream());
                //System.out.println("Accepted a request!\n" + request);
                while(request.busy);
                //Make a connection to a real proxy.
                //Host & Port - should be read from the request
                URL url = null;
                try {
                    url = new URL(request.getRequestURL());
                } catch (MalformedURLException e){
                    url = new URL("http:\\"+request.getRequestHost()+request.getRequestURL());
                }

                System.out.println(request);

                //remove entry from cache if needed
                if (!request.getCacheControl().equals(CacheControl.CACHE) && cache.containsRequest(request)) {
                    cache.remove(request);
                }

                Response response = null;

                if (request.getRequestType() == RequestType.GET && request.getCacheControl().equals(CacheControl.CACHE) && cache.containsRequest(request)) {
                    System.out.println("Reading from the Cache!!!!");
                    response = cache.get(request);
                } else {
                    System.out.println("Not Reading from the Cache!!!!");
                    //Get the response from the destination
                    int remotePort = (url.getPort() == -1) ? 80 : url.getPort();
                    System.out.println("I am going to try to connect to: " + url.getHost() + " at port " + remotePort);
                    serverSocket = new Socket(url.getHost(), remotePort);
                    System.out.println("Connected.");
                    serverSocket.setSoTimeout(50000);

                    //write to the server - keep it open.
                    System.out.println("Writing to the server's buffer...");
                    toServer = new BufferedWriter(new OutputStreamWriter(serverSocket.getOutputStream()));
                    toServer.write(request.getFullRequest());
                    toServer.flush();
                    System.out.println("flushed.");

                    System.out.println("Getting a response...");
                    response = new Response(serverSocket.getInputStream());
                    //System.out.println("Got a response!\n" + response);
                    System.out.println("Got a response!\n");
                    //wait for the response
                    while(response.isBusy());
                }

                if (request.getRequestType() == RequestType.GET && request.getCacheControl().equals(CacheControl.CACHE) && response.getResponseCode() == 200) {
                    System.out.println("Writing to the Cache!!!!");
                    cache.put(request, response);
                }
                else System.out.println("Not Writing to the Cache!!!!");
                response = filter.filter(response);

                // Return the response to the client
                toClient = new OutputStreamWriter(clientSocket.getOutputStream());
                System.out.println("I am going to write the following response:\n" + response);
                BufferedReader responseReader = new BufferedReader(new InputStreamReader(response.getFullResponse()));
                while (responseReader.ready()) {
                    toClient.write(responseReader.read());
                }
                toClient.flush();
                toClient.close();
                clientSocket.close();
                System.out.println("Finished with request!");

            } catch (IOException e) {
                e.printStackTrace();
                continue;
            }
        }
   }
}

我将不胜感激关于如何解决此问题的任何和所有反馈/见解/建议,并且当然更喜欢一些实际代码。

最佳答案

将其存储在一个字节数组中:

byte[] bufer = new byte[???];

更详细的过程:

  • 为响应头创建一个足够大的缓冲区(如果更大则丢弃异常)。
  • 读取字节到缓冲区,直到在缓冲区中找到 \r\n\r\n。您可以编写一个辅助函数,例如 static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle)
  • 当遇到 header 结尾时,创建一个 strinform 缓冲区的前 n 个字节。然后,您可以在此字符串上使用 RegEx(另请注意,RegEx 不是解析 HTTPeader 的最佳方法)。
  • 请准备好缓冲区将包含 header 之后的附加数据,这些数据是响应正文的第一个字节。您必须将这些字节复制到输出流或输出文件或输出缓冲区。
  • 阅读响应正文的其余部分。 (直到读取content-length 或关闭流)。

编辑:

您没有遵循我建议的这些步骤。 inputReader.ready() 是检测响应阶段的错误方法。无法保证 header 将在单个突发中发送。

我尝试用代码(arrayIndexOf 除外)函数编写原理图。

InputStream is;

// Create a buffer large enough for the response header (and drop exception if it is bigger).
byte[] headEnd = {13, 10, 13, 10}; // \r \n \r \n
byte[] buffer = new byte[10 * 1024];
int length = 0;

// Read bytes to the buffer until you find `\r\n\r\n` in the buffer. 
int bytes = 0;
int pos;
while ((pos = arrayIndexOf(buffer, 0, length, headEnd)) == -1 && (bytes = is.read(buffer, length, buffer.length() - length)) > -1) {
    length += bytes;

    // buffer is full but have not found end siganture
    if (length == buffer.length())
        throw new RuntimeException("Response header too long");
}

// pos contains the starting index of the end signature (\r\n\r\n) so we add 4 bytes
pos += 4;

// When you encounter the end of header, create a strinform the first *n* bytes
String header = new String(buffer, 0, pos);

System.out.println(header);

// Be prepared that the buffer will contain additional data after the header
// ... so we process it
System.out.write(buffer, pos, length - pos);

// process the rest until connection is closed
while (bytes = is.read(buffer, 0, bufer.length())) {
    System.out.write(buffer, 0, bytes);
}

arrayIndexOf 方法可能看起来像这样:(可能有更快的版本)

public static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle) {
    for (int i=offset; i<offset+length-nedle.length(); i++) {
        boolean match = false;
        for (int j=0; j<needle.length(); j++) {
            match = haystack[i + j] == needle[j];
            if (!match)
                break;
        }
        if (match)
            return i;
    }
    return -1;
}

关于java - 如何存储可能包含二进制数据的 Http Response?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5777503/

相关文章:

go - GO中字符串的ASCII编码

c - 如何将音频帧从输入 .mp4 传递到 libavcodec 中的输出 .mp4?

java - 用于不同机器中进程之间通信的快速而肮脏的解决方案

java - 如何在jsf 2中使用参数 "javax.faces.FACELETS_SUFFIX"

java - 如何从 int 流中获取 int 类型的平均值?

java - 构建 EntityManagerFactory 后如何获取 Hibernate 配置?

c++ - Boost http 服务器示例不起作用?

java - 压缩 Http header

javascript - 如何使用 $http 加载页面并在 angularjs 中发布数据

python - 尝试在 python 中标记编码时,ValueError 包含新标签