mysql - jsoup html解析器选择错误的表

标签 mysql jsoup

我需要解析http://developer.android.com/about/dashboards/index.html将第一个表(“平台版本”)中的数据获取到我的 mysql 数据库中。 不幸的是,我的程序选择最后一个表并将其数据写入数据库。 实际上它应该保存标有“table”的所有元素并选择这 3 个中的第一个...有什么建议出了什么问题吗?尚未在堆栈上找到任何解决方案

问候,奥利

    ArrayList<Table>tableList = new ArrayList<Table>();

    String URL ="jdbc:mysql://localhost:3306/crawler";
    String USER = "root";
    String PASSWORD = "";
    String DRIVER = "com.mysql.jdbc.Driver";
    Connection conn = null;
    try {
        Class.forName(DRIVER);
        //Connect to MySQL database
        conn=DriverManager.getConnection(URL, USER, PASSWORD);
        Statement stmt=conn.createStatement();
        URL url = new URL("http://developer.android.com/about/dashboards/index.html");
        //Connect to URL
        URLConnection con = url.openConnection();
        con.setDoOutput(true);


        Document doc = Jsoup.parse(con.getInputStream(), "UTF-8", "http://developer.android.com/about/dashboards/index.html");
        Elements allTables = doc.getElementsByTag("table"); //hold all tables
        Element table = allTables.get(0); // using first table
        Elements row = table.getElementsByTag("tr"); //each row
        for (Element link:row){
            Elements cell = link.getElementsByTag("td"); // each cell per row

            int count =0;
            Table table1 = new Table();
            for(Element link1:cell){
                String linkText=link1.text(); //each cell value
                if(count == 0){
                    table1.setVersion(linkText);
                }else if (count == 1){
                    table1.setCodename(linkText);
                }else if(count == 2){
                    table1.setApi(Integer.parseInt(linkText));
                }else if (count == 3){
                    table1.setDistribution(Float.parseFloat(linkText));
                }
                count++;
            }
            if(count !=0){
                tableList.add(table1);
            }
        }

        for (Table table1:tableList){
            stmt.executeUpdate("INSERT INTO verteilung_android2 (Version, Codename, API, Distribution) "
                    + "VALUES ('" +table1.getVersion()+ "','" +table1.getCodename()+ "','"+table1.getApi()+"','" +table1.getDistribution()+"')");
            System.out.println(tableList);
        }

    }           
    catch (SQLException e){
        e.printStackTrace();
    }catch (IOException e){
        e.printStackTrace();
    }catch (ClassNotFoundException e){
        e.printStackTrace();
    }

}

最佳答案

问题是你想要的信息不是表格,而是图表的java脚本代码,如下所示:

<script>
var VERSION_DATA =
[
  {
    "chart": "//chart.googleapis.com/chart?chl=Froyo%7CGingerbread%7CIce%20Cream%20Sandwich%7CJelly%20Bean%7CKitKat&chf=bg%2Cs%2C00000000&chd=t%3A0.8%2C14.9%2C12.3%2C58.4%2C13.6&chco=c4df9b%2C6fad0c&cht=p&chs=500x250",
    "data": [
      {
        "api": 8,
        "name": "Froyo",
        "perc": "0.8"
      },
      {
        "api": 10,
        "name": "Gingerbread",
        "perc": "14.9"
      },
      {
        "api": 15,
        "name": "Ice Cream Sandwich",
        "perc": "12.3"
      },
      {
        "api": 16,
        "name": "Jelly Bean",
        "perc": "29.0"
      },
      {
        "api": 17,
        "name": "Jelly Bean",
        "perc": "19.1"
      },
      {
        "api": 18,
        "name": "Jelly Bean",
        "perc": "10.3"
      },
      {
        "api": 19,
        "name": "KitKat",
        "perc": "13.6"
      }
    ]
  }
];

Jsoup 无法解析 JavaScript 代码,但您可以手动解析此代码。 你可以做这样的事情,它不是做你想做的事,但它会给你一个开始的想法。

Element script = doc.select("script").get(7); // Get the script part for chart.
            String scriptText = script.toString();
            String lines[] = scriptText.split("\\r?\\n"); //thiw will split the String line by line

            for (int i = 0; i < lines.length; i++) {
                String line = lines[i];
                if(line.contains("api"))
                    System.out.println(line);
                if(line.contains("name"))
                    System.out.println(line);
                if(line.contains("perc"))
                    System.out.println(line);
            }

也许,在互联网上有一个库可以解析 JavaScript 代码。但我不知道这个。

关于mysql - jsoup html解析器选择错误的表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24055224/

相关文章:

mysql - Liquibase mysql 存储换行符

mysql - 如何维护段落?

mysql - 如何使用mysql获取用户的顶层分层数据?

mysql - 如何使用一个搜索栏搜索数据库中的多列

java - Jsoup:对元素组进行排序

java - 如何将用 Jsoup(Java html 解析器)制作的文档转换为字符串

Java - 如何获取结果集上的列名

java - Jsoup 允许 <table> 但不允许 <tbody>

java - JSOUP 重复的 html 表

java - JSoup - 获取 href