java - 使用jsoup从HTML网页解析PHP数据

标签 java php html parsing jsoup

我不完全确定如何表达这个问题或标题,所以就这样吧。我正在使用 jsoup 来解析网页 ( http://champion.gg/statistics/ ),并且我正在尝试使用此代码从他们的表中获取统计信息。

public void connect(String url) {
    try {
        Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get();
        System.out.println(doc.toString());
        Element table = doc.select("table[class=table table-striped]").first();
        Element tbody = table.select("tbody").first();
        Iterator<Element> rows = tbody.select("tr").iterator();
        rows.forEachRemaining(row -> {
            System.out.println(row.toString());
        });
    } catch(IOException exception) {
        if(Settings.DEBUG) {
            Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception);
        }
        Program.alert("Error loading webpage!");
    }
}

它正在产生这个结果

<tr ng-repeat="champion in filteredChampions = (championData | startsWith:search.title | filter:roleSort | orderBy:[order+sortExpression.sortBy,order+sortExpression.lastSortBy])"> 
 <td class="rank">{{indexNumber($index, filteredChampions.length)}}</td> 
 <td ng-class="{'selected-column':determineSelected('title')}"> <a href="/champion/{{champion.key}}/{{champion.role}}"> 
  <div class="tsm-tooltip tsm-angular-champion-tt" data-type="champions" data-name="{{champion.key}}" data-id="{{matchupData}}"> 
   <div class="matchup-champion {{champion.key}}"></div> 
   <span class="stat-champ-title">{{champion.title}}</span> 
  </div> </a> </td> 
 <td class="stats-role-title" ng-class="{'selected-column':determineSelected('role')}">{{champion.role}}</td> 
 <td ng-class="{'selected-column':determineSelected('winPercent')}"> <span ng-class="{'top-half': (champion.general.winPercent >= 50), 'bottom-half': (champion.general.winPercent < 50)}">{{champion.general.winPercent}}%</span> </td> 
 <td ng-class="{'selected-column':determineSelected('playPercent')}">{{champion.general.playPercent}}%</td> 
 <td ng-class="{'selected-column':determineSelected('banRate')}">{{champion.general.banRate}}%</td> 
 <td ng-class="{'selected-column':determineSelected('experience')}">{{champion.general.experience}}</td> 
 <td ng-class="{'selected-column':determineSelected('kills')}">{{champion.general.kills}}</td> 
 <td ng-class="{'selected-column':determineSelected('deaths')}">{{champion.general.deaths}}</td> 
 <td ng-class="{'selected-column':determineSelected('assists')}">{{champion.general.assists}}</td> 
 <td ng-class="{'selected-column':determineSelected('largestKillingSpree')}">{{champion.general.largestKillingSpree}}</td> 
 <td ng-class="{'selected-column':determineSelected('totalDamageDealtToChampions')}">{{champion.general.totalDamageDealtToChampions}}</td> 
 <td ng-class="{'selected-column':determineSelected('totalDamageTaken')}">{{champion.general.totalDamageTaken}}</td> 
 <td ng-class="{'selected-column':determineSelected('totalHeal')}">{{champion.general.totalHeal}}</td> 
 <td ng-class="{'selected-column':determineSelected('minionsKilled')}">{{champion.general.minionsKilled}}</td> 
 <td ng-class="{'selected-column':determineSelected('neutralMinionsKilledEnemyJungle')}">{{champion.general.neutralMinionsKilledEnemyJungle}}</td> 
 <td ng-class="{'selected-column':determineSelected('neutralMinionsKilledTeamJungle')}">{{champion.general.neutralMinionsKilledTeamJungle}}</td> 
 <td ng-class="{'selected-column':determineSelected('goldEarned')}">{{champion.general.goldEarned}}</td> 
 <td ng-class="{'selected-column':determineSelected('overallPosition')}">{{champion.general.overallPosition}}</td> 
 <td ng-class="{'selected-column':determineSelected('overallPositionChange')}"><span class="glyphicon" ng-class="{'glyphicon-arrow-up': (champion.general.overallPositionChange > 0), 'glyphicon-arrow-down': (champion.general.overallPositionChange < 0), 'same-position': (champion.general.overallPositionChange === 0)}">{{Math.abs(champion.general.overallPositionChange)}}</span></td> 
</tr>

现在,我得到的结果中不会显示特定冠军的平均击杀数,而是显示 champion.general.kills 。如何解析页面,以便它给出实际结果,例如 8,而不是 champion.general.kills

最佳答案

当涉及到从网页中提取数据时,您必须转到数据所在的位置。在这种情况下,数据仍在网页内,这很好。您需要获取包含数据的脚本标记并对其进行解析。目前,此示例代码假设它是索引 11 处的脚本标记。

public static void main(String[] args)
{
    try
    {
        Document doc = Jsoup
                .connect("http://champion.gg/statistics/")
                .userAgent(
                        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
                .get();
        System.out.println(doc.toString());
        Elements table = doc.select("script");
        Element script = table.get(11);
        parseText(script);
    }
    catch (IOException exception)
    {

    }
}

public static void parseText(Element script)
{
    String text = ((DataNode) script.childNode(0)).toString().trim();
    int index = text.indexOf("_id");
    while (index > 0)
    {
        index += 6;// Beginning of value
        int endQuote = text.indexOf("\"", index);
        String id = text.substring(index, endQuote);
        index = text.indexOf("\"key\":\"", endQuote);
        endQuote = text.indexOf("\"", index + 8);
        String key = text.substring(index, endQuote);
        index = text.indexOf("\"kills\":", endQuote);
        endQuote = text.indexOf(",", index);
        String kills = text.substring(index, endQuote);
        text = text.substring(endQuote);
        index = text.indexOf("_id", index);
        System.out.println(id + key + kills);
    }
}

输出:

5812965753fa9743395ee93a"key":"厄加特"杀死":6.47

5812965753fa9743395ee93b"key":"亚托克斯"杀死":5.8

5812965753fa9743395ee93d"key":"加里奥"杀死":4.58

5812965753fa9743395ee940"key":"Kled"杀死":7.3 ...

关于java - 使用jsoup从HTML网页解析PHP数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40570505/

相关文章:

java - 将字符串 ›ÄX%ß=dÜgž=ŒT8ï8L]4C®° 插入 sql 显示错误

java - 保持 i18n 资源同步

php - 将 AJAX HTML 响应读取到 JavaScript 数组中

php - 如何在codeigniter中基于关联实体获取数据

php - 将数据从 MySQL 数据库加载到 HTML 文本框

html - 使用 Interop 将 html 文本添加到 Word

java - 从父类(super class) static main 创建子类

java - 带有 CXF : How to use the ResponseWrapper? 的 Web 服务

php - 从多个表中删除数据库中的数据

html - 如何使用包含不同内容的同一个模板创建单独的文件和 grunt Baker?