Java 登录 ASP.NET Web 窗体的方法

标签 java screen-scraping httpurlconnection

我正在开发一个 java 程序,它需要登录到 ASP.NET 网络表单,然后在通过身份验证后下载文件。正常的 HTTP GET/POST 不是问题,但是当我从 java 连接时,ASP 似乎没有给我 SESSION ID,而是来自浏览器。

当我在 Firefox 中查看 header 信息时,我看到从初始登录开始就设置了 cookie,但随后页面立即被重定向到一个新的 URL。我不确定它是否重要,但它在登录后重定向到的页面包含 iframe。我试过在里面加载主页和 iframe src,但都没有在标题中给我 cookie。

//Pull up the login page, extract out the hidden input variables __VIEWSTATE, __EVENTVALIDATION
URL url = new URL(loginPage);
HttpURLConnection conn = null;
conn = (HttpURLConnection) url.openConnection();
//This reads the page line-by-line and extracts out all the values from hidden input fields
Map<String,String> formFields = getViewstate(conn);

//Now re-open the URL to actually submit the POST data
conn = (HttpURLConnection) url.openConnection();            
conn.setRequestMethod("POST");
conn.setDoOutput(true);
conn.setDoInput(true);
DataOutputStream out = new DataOutputStream(conn.getOutputStream());
String postValues = URLEncoder.encode("txtUsername", "UTF-8") + "=" + URLEncoder.encode(uid, "UTF-8");
postValues += "&" + URLEncoder.encode("txtPassword", "UTF-8") + "=" + URLEncoder.encode(pwd, "UTF-8");
postValues += "&" + URLEncoder.encode("__EVENTTARGET", "UTF-8") + "=" + URLEncoder.encode("", "UTF-8");
postValues += "&" + URLEncoder.encode("__VIEWSTATE", "UTF-8") + "=" + URLEncoder.encode(formFields.get("viewstate"), "UTF-8");
postValues += "&" + URLEncoder.encode("__EVENTVALIDATION", "UTF-8") + "=" + URLEncoder.encode(formFields.get("eventvalidation"), "UTF-8");
out.writeBytes(postValues);
out.flush();
out.close();
//At this point looking at Firefox sniffer data, it should be sending back the cookie
//However there is no Set-Cookie in the header fields
for (int i = 1; (key = conn.getHeaderFieldKey(i)) != null; i++) {
        // get ASP.NET_SessionId from cookie
    if (key.equalsIgnoreCase("set-cookie")) {
        sessionId = conn.getHeaderField(key);
        sessionId = sessionId.substring(0, sessionId.indexOf(";"));
    }
}
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = rd.readLine()) != null) {
    //The page it prints out is the page it was redirected to when logged in through the browser
    System.out.println(line);
}
rd.close();
//At this point, it was a successful login, but I never got the cookie so I'm stuck

最佳答案

我认为 HtmlUnit 所基于的 HttpClient 具有我认为您正在寻找的较低级别的功能。很好地处理 cookie,但如果您需要更多,那么 Kurt 是对的,您应该寻找功能更多的东西。如果你真的需要获得完整的浏览器功能,你可以尝试像 Selenium/Webdriver 这样的东西,它实际上在程序控制下自动化浏览器。

关于Java 登录 ASP.NET Web 窗体的方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4462742/

相关文章:

java - Spring + Angular 无法两次上传同一文件

javascript - Python : How to scrape a page to get an information that will be used to scrape another one, 等等?

javascript - 在启用 Javascript 的情况下抓取网站?

java - 调用 getResponseCode、getContentLength 时 HttpURLConnection 卡住

java - 如何在java中重写或生成类似twitter和facebook的url

Java 错误 : cannot find symbol yet the variables are declared?

java - 如果时区的日期时间不明确,是否抛出异常?

javascript - 如何验证此开源工具中的代码不会滥用您的 Google 帐户凭据?

Java HTTPUrlConnection 返回 500 状态码

java - Java中如何处理非UTF8 html页面?