java - 带有问题网站的 HTTP 客户端 POST

标签 java javascript httpclient

我正在尝试从 web site 中检索一些数据.

我写了一个 java 类,它似乎适用于许多网站,但它不适用于这个特定的网站,它在输入表单中使用了大量的 javascript。

正如您从代码中看到的那样,我指定了从 HTML 源中获取名称的输入字段,但也许这个网站不接受这种 POST 请求?

如何模拟用户交互来检索生成的 HTML?

package com.transport.urlRetriver;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;

import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;

public class UrlRetriver {


    String stationPoller (String url, ArrayList<NameValuePair> params) {

        HttpPost postRequest;
        HttpResponse response;
        HttpEntity entity;
        String result = null;

        DefaultHttpClient httpClient = new DefaultHttpClient();


        try {

            postRequest = new HttpPost(url);

            postRequest.setEntity((HttpEntity) new UrlEncodedFormEntity(params));
            response = httpClient.execute(postRequest);

            entity = response.getEntity();

            if(entity != null){
              InputStream inputStream = entity.getContent();
              result = convertStreamToString(inputStream);
            }



        } catch (Exception e) {

            result = "We had a problem";

        } finally {

            httpClient.getConnectionManager().shutdown();

        }



        return result;

    }





    void ATMtravelPoller () {




        ArrayList<NameValuePair> params = new ArrayList<NameValuePair>(2);

        String url = "http://www.atm-mi.it/it/Pagine/default.aspx";

        params.add(new BasicNameValuePair("ctl00$SPWebPartManager1$g_afa5adbb_5b60_4e50_8da2_212a1d36e49c$txt_address_s", "Viale romagna 1"));

        params.add(new BasicNameValuePair("ctl00$SPWebPartManager1$g_afa5adbb_5b60_4e50_8da2_212a1d36e49c$txt_address_e", "Viale Toscana 20"));

        params.add(new BasicNameValuePair("sf_method", "POST"));

        String result = stationPoller(url, params);

        saveToFile(result, "/home/rachele/Documents/atm/out4.html");

    }

    static void saveToFile(String toFile, String pos){
          try{
                // Create file 
                FileWriter fstream = new FileWriter(pos);
                BufferedWriter out = new BufferedWriter(fstream);
                out.write(toFile);
                //Close the output stream
                out.close();
                }catch (Exception e){//Catch exception if any
                  System.err.println("Error: " + e.getMessage());
                }
              }

    private static String convertStreamToString(InputStream is) {
          BufferedReader reader = new BufferedReader(new InputStreamReader(is));
          StringBuilder stringBuilder = new StringBuilder();

          String line = null;
          try {
            while ((line = reader.readLine()) != null) {
              stringBuilder.append(line + "\n");
            }
          } catch (IOException e) {
            e.printStackTrace();
          } finally {
            try {
              is.close();
            } catch (IOException e) {
              e.printStackTrace();
            }
          }
          return stringBuilder.toString();
        }

}

最佳答案

在我看来,可能存在具有动态值的 javascript 生成字段,以防止自动代码抓取网站。发送您要下载的具体站点。

关于java - 带有问题网站的 HTTP 客户端 POST,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4775585/

相关文章:

javascript - 如果选中里面的复选框,则更改 div 背景颜色

javascript - 将图像复制到剪贴板在 Firefox 和 Chrome 中不起作用

jersey - 在 Dropwizard 应用程序中将 http 客户端注册到 jersey 环境时无法解析符号ExternalServiceResource

java - Unity3D 四元数 libGDX 等效项

javascript - 通过ajax post发送文件和表单数据

java - 如何从数据库填充 JSP 中的下拉列表

bash - 可以在 bash 脚本中使用 httpclient 吗?对于基于 session 的 HTTP GET 和 POST

angular - Angular "strict-origin-when-cross-origin"错误是什么意思?

java - 失败 java.lang.IllegalStateException : Expected BEGIN_ARRAY but was BEGIN_OBJECT when getting response from Wordpress REST API to Android

Java 服务器 - 如何从 InputStreamReader 获取输入并将其转换为字符串?