我得到这样的东西
您好,我正在使用 Selenium Webdriver 抓取网页,我能够获取我的数据,但问题是这直接与浏览器交互,我不想打开 Web 浏览器,而是想按原样抓取所有数据
我怎样才能实现我的目标
这是我的代码
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.support.ui.Select;
public class GetData {
public static void main(String args[]) throws InterruptedException {
String sDate = "27/03/2014";
WebDriver driver = new FirefoxDriver();
String url="http://www.upmandiparishad.in/commodityWiseAll.aspx";
driver.get(url);
Thread.sleep(5000);
// select barge
new Select(driver.findElement(By.id("ctl00_ContentPlaceHolder1_ddl_commodity"))).selectByVisibleText("Jo");
driver.findElement(By.id("ctl00_ContentPlaceHolder1_txt_rate")).sendKeys(sDate);
// click buttonctl00_ContentPlaceHolder1_txt_rate
Thread.sleep(3000);
driver.findElement(By.id("ctl00_ContentPlaceHolder1_btn_show")).click();
Thread.sleep(5000);
//get only table tex
WebElement findElement = driver.findElement(By.id("ctl00_ContentPlaceHolder1_GridView1"));
String htmlTableText = findElement.getText();
// do whatever you want now, This is raw table values.
System.out.println(htmlTableText);
driver.close();
driver.quit();
}
}
My updated New code
import com.gargoylesoftware.htmlunit.BrowserVersion;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.htmlunit.HtmlUnitDriver;
import org.openqa.selenium.support.ui.Select;
public class Getdata1 {
public static void main(String args[]) throws InterruptedException {
WebDriver driver = new HtmlUnitDriver(BrowserVersion.FIREFOX_3_6);
driver.get("http://www.upmandiparishad.in/commodityWiseAll.aspx");
System.out.println(driver.getPageSource());
Thread.sleep(5000);
// select barge
new Select(driver.findElement(By.id("ctl00_ContentPlaceHolder1_ddl_commodity"))).selectByVisibleText("Jo");
String sDate = "12/04/2014"; //What date you want
driver.findElement(By.id("ctl00_ContentPlaceHolder1_txt_rate")).sendKeys(sDate);
driver.findElement(By.id("ctl00_ContentPlaceHolder1_btn_show")).click();
Thread.sleep(3000);
//get only table tex
WebElement findElement = driver.findElement(By.id("ctl00_ContentPlaceHolder1_GridView1"));
String htmlTableText = findElement.getText();
// do whatever you want now, This is raw table values.
System.out.println(htmlTableText);
driver.close();
driver.quit();
}
}
提前致谢
最佳答案
使用 Selenium 的 HtmlUnit 或 HtmlUnitDriver
WebDriver driver = new HtmlUnitDriver(BrowserVersion.FIREFOX_17);
driver.get("http://www.upmandiparishad.in/commodityWiseAll.aspx");
System.out.println(driver.getPageSource());
Thread.sleep(5000);
// select barge
new Select(driver.findElement(By.id("ctl00_ContentPlaceHolder1_ddl_commodity"))).selectByVisibleText("Jo");
String sDate = "12/04/2014"; //What date you want
driver.findElement(By.id("ctl00_ContentPlaceHolder1_txt_rate")).sendKeys(sDate);
driver.findElement(By.id("ctl00_ContentPlaceHolder1_btn_show")).click();
Thread.sleep(3000);
//get only table tex
WebElement findElement = driver.findElement(By.id("ctl00_ContentPlaceHolder1_GridView1"));
String htmlTableText = findElement.getText();
// do whatever you want now, This is raw table values.
System.out.println(htmlTableText);
driver.close();
driver.quit();
要获得表格输出,您可以尝试这样的操作..
String arrCells[] = htmlTableText.split(" ");
Boolean bIsANumber = false;
for(int i = 0; i < arrCells.length; i++) {
try {
int tmp = Integer.parseInt(arrCells[i]);
bIsANumber = true;
}
catch(Exception ex) {
bIsANumber = false;
}
if(bIsANumber) {
System.out.print("\n"+arrCells[i]+"\t");
}
else {
System.out.print(arrCells[i]+"\t");
}
}
关于java - 如何使用 htmlunitdriver 进行网页抓取?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22807527/