1

I'm not entirely sure how to phrase this question or title it so here it goes. I am using jsoup to parse a webpage (http://champion.gg/statistics/) and I'm trying to grab the stats from their table using this code.

public void connect(String url) {
    try {
        Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get();
        System.out.println(doc.toString());
        Element table = doc.select("table[class=table table-striped]").first();
        Element tbody = table.select("tbody").first();
        Iterator<Element> rows = tbody.select("tr").iterator();
        rows.forEachRemaining(row -> {
            System.out.println(row.toString());
        });
    } catch(IOException exception) {
        if(Settings.DEBUG) {
            Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception);
        }
        Program.alert("Error loading webpage!");
    }
}

and it is producing this result

<tr ng-repeat="champion in filteredChampions = (championData | startsWith:search.title | filter:roleSort | orderBy:[order+sortExpression.sortBy,order+sortExpression.lastSortBy])"> 
 <td class="rank">{{indexNumber($index, filteredChampions.length)}}</td> 
 <td ng-class="{'selected-column':determineSelected('title')}"> <a href="/champion/{{champion.key}}/{{champion.role}}"> 
  <div class="tsm-tooltip tsm-angular-champion-tt" data-type="champions" data-name="{{champion.key}}" data-id="{{matchupData}}"> 
   <div class="matchup-champion {{champion.key}}"></div> 
   <span class="stat-champ-title">{{champion.title}}</span> 
  </div> </a> </td> 
 <td class="stats-role-title" ng-class="{'selected-column':determineSelected('role')}">{{champion.role}}</td> 
 <td ng-class="{'selected-column':determineSelected('winPercent')}"> <span ng-class="{'top-half': (champion.general.winPercent >= 50), 'bottom-half': (champion.general.winPercent < 50)}">{{champion.general.winPercent}}%</span> </td> 
 <td ng-class="{'selected-column':determineSelected('playPercent')}">{{champion.general.playPercent}}%</td> 
 <td ng-class="{'selected-column':determineSelected('banRate')}">{{champion.general.banRate}}%</td> 
 <td ng-class="{'selected-column':determineSelected('experience')}">{{champion.general.experience}}</td> 
 <td ng-class="{'selected-column':determineSelected('kills')}">{{champion.general.kills}}</td> 
 <td ng-class="{'selected-column':determineSelected('deaths')}">{{champion.general.deaths}}</td> 
 <td ng-class="{'selected-column':determineSelected('assists')}">{{champion.general.assists}}</td> 
 <td ng-class="{'selected-column':determineSelected('largestKillingSpree')}">{{champion.general.largestKillingSpree}}</td> 
 <td ng-class="{'selected-column':determineSelected('totalDamageDealtToChampions')}">{{champion.general.totalDamageDealtToChampions}}</td> 
 <td ng-class="{'selected-column':determineSelected('totalDamageTaken')}">{{champion.general.totalDamageTaken}}</td> 
 <td ng-class="{'selected-column':determineSelected('totalHeal')}">{{champion.general.totalHeal}}</td> 
 <td ng-class="{'selected-column':determineSelected('minionsKilled')}">{{champion.general.minionsKilled}}</td> 
 <td ng-class="{'selected-column':determineSelected('neutralMinionsKilledEnemyJungle')}">{{champion.general.neutralMinionsKilledEnemyJungle}}</td> 
 <td ng-class="{'selected-column':determineSelected('neutralMinionsKilledTeamJungle')}">{{champion.general.neutralMinionsKilledTeamJungle}}</td> 
 <td ng-class="{'selected-column':determineSelected('goldEarned')}">{{champion.general.goldEarned}}</td> 
 <td ng-class="{'selected-column':determineSelected('overallPosition')}">{{champion.general.overallPosition}}</td> 
 <td ng-class="{'selected-column':determineSelected('overallPositionChange')}"><span class="glyphicon" ng-class="{'glyphicon-arrow-up': (champion.general.overallPositionChange > 0), 'glyphicon-arrow-down': (champion.general.overallPositionChange < 0), 'same-position': (champion.general.overallPositionChange === 0)}">{{Math.abs(champion.general.overallPositionChange)}}</span></td> 
</tr>

Now instead of producing the result for the average amount of kills a specific champion has it will say champion.general.kills in the result I get. How do I parse the page so that instead of champion.general.kills it will give an actual result such as 8?

1
  • It looks like the website is using Angular to inject the statistics in the view. Maybe this answer could help you. Commented Nov 13, 2016 at 11:23

2 Answers 2

0

When it comes to extracting data out of a webpage, you have to go to where the data is. In this case, the data is still within the webpage, which is good. You need to go get the script tag containing the data and parse that. For now, this sample code assumes it is the script tag at index 11.

public static void main(String[] args)
{
    try
    {
        Document doc = Jsoup
                .connect("http://champion.gg/statistics/")
                .userAgent(
                        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
                .get();
        System.out.println(doc.toString());
        Elements table = doc.select("script");
        Element script = table.get(11);
        parseText(script);
    }
    catch (IOException exception)
    {

    }
}

public static void parseText(Element script)
{
    String text = ((DataNode) script.childNode(0)).toString().trim();
    int index = text.indexOf("_id");
    while (index > 0)
    {
        index += 6;// Beginning of value
        int endQuote = text.indexOf("\"", index);
        String id = text.substring(index, endQuote);
        index = text.indexOf("\"key\":\"", endQuote);
        endQuote = text.indexOf("\"", index + 8);
        String key = text.substring(index, endQuote);
        index = text.indexOf("\"kills\":", endQuote);
        endQuote = text.indexOf(",", index);
        String kills = text.substring(index, endQuote);
        text = text.substring(endQuote);
        index = text.indexOf("_id", index);
        System.out.println(id + key + kills);
    }
}

Output:

5812965753fa9743395ee93a"key":"Urgot"kills":6.47

5812965753fa9743395ee93b"key":"Aatrox"kills":5.8

5812965753fa9743395ee93d"key":"Galio"kills":4.58

5812965753fa9743395ee940"key":"Kled"kills":7.3 ...

Sign up to request clarification or add additional context in comments.

2 Comments

While this works for 20 champions i (in all honesty) do not fully understand your code, i can understand selecting the script but why do you have to use .get(11); what does that do? I will try to do reasearch on my own in the meantime, i also do not understand what you are using substrings, shouldn't there be an easier way to read the data within the script? It sort of looks like json, i was hoping i could read the data easier being it looks like objects within the script. Thank you very much for your help either way!
The .get(11) gets the twelfth script tag on the page. There were eleven other script tags that came before it. There may be an easier way, but I don't know much about JSON and I resort to low-level tactics to get by.
0

I found the answer with ProgrammersBlock's help. By retriving the script data i turned it from JSON into a full java object!

package com.databot.web.parser;

import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import java.util.logging.Level;

import org.jsoup.Jsoup;
import org.jsoup.nodes.DataNode;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import com.databot.Program;
import com.databot.Settings;
import com.databot.champions.ChampionStats;
import com.databot.champions.Champion;
import com.google.gson.stream.JsonReader;

public class WebParser {

public void connect(String url) {
    try {
        Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get();
        Elements table = doc.select("script");
        Element script = table.get(11);
        parseText(script);
    } catch(IOException exception) {
        if(Settings.DEBUG) {
            Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception);
        }
        Program.alert("Error loading webpage!");
    }
}

public void parseText(Element script)
{
    String text = ((DataNode) script.childNode(0)).toString().substring(22).trim();
    System.out.println(text);
    List<Champion> champions = new ArrayList<>();
    try {
        JsonReader reader = new JsonReader(new StringReader(text));
        reader.setLenient(true);
        reader.beginArray();
        while(reader.hasNext()) {
            reader.beginObject();
                String id = "", key = "", role = "", title = "";
                ChampionStats stats = new ChampionStats(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 0);
            while(reader.hasNext()) {
                String name = reader.nextName();
                if(name.equalsIgnoreCase("_id")) {
                    id = reader.nextString();
                } else if(name.equalsIgnoreCase("key")) {
                    key = reader.nextString();
                } else if(name.equalsIgnoreCase("role")) {
                    role = reader.nextString();
                } else if(name.equalsIgnoreCase("title")) {
                    title = reader.nextString();
                } else if(name.equalsIgnoreCase("general")) {
                    double winPercent = 0, playPercent = 0, banRate = 0, experience = 0, kills = 0, deaths = 0, assists = 0, totalDamageDealtToChampions = 0, totalDamageTaken = 0, totalHeal = 0, largestKillingSpree = 0, minionsKilled = 0, neutralMinionsKilledTeamJungle = 0, neutralMinionsKilledEnemyJungle = 0, goldEarned = 0; 
                    int overallPosition = 0, overallPositionChange = 0;
                        reader.beginObject();
                        while(reader.hasNext()) {
                            String gName = reader.nextName();
                            if(gName.equalsIgnoreCase("winPercent")) {
                                winPercent = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("playPercent")) {
                                playPercent = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("banRate")) {
                                banRate = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("experience")) {
                                experience = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("kills")) {
                                kills = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("deaths")) {
                                deaths = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("assists")) {
                                assists = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("totalDamageDealtToChampions")) {
                                totalDamageDealtToChampions = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("totalDamageTaken")) {
                                totalDamageTaken = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("totalHeal")) {
                                totalHeal = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("largestKillingSpree")) {
                                largestKillingSpree = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("minionsKilled")) {
                                minionsKilled = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("neutralMinionsKilledTeamJungle")) {
                                neutralMinionsKilledTeamJungle = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("neutralMinionsKilledEnemyJungle")) {
                                neutralMinionsKilledEnemyJungle = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("goldEarned")) {
                                goldEarned = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("overallPosition")) {
                                overallPosition = reader.nextInt();
                            } else if(gName.equalsIgnoreCase("overallPositionChange")) {
                                overallPositionChange = reader.nextInt();
                            } else {
                                reader.skipValue();
                            }
                        }
                        reader.endObject();
                        stats = new ChampionStats(winPercent, playPercent, banRate, experience, kills, deaths, assists, totalDamageDealtToChampions, totalDamageTaken, totalHeal, largestKillingSpree, minionsKilled, neutralMinionsKilledTeamJungle, neutralMinionsKilledEnemyJungle, goldEarned, overallPosition, overallPositionChange);
                } else {
                    reader.skipValue();
                }
            }
            reader.endObject();
            champions.add(new Champion(id, key, role, title, stats));
        }
        reader.endArray();
        reader.close();
    } catch (Exception e) {
        Program.alert("Error reading JSON data!");
        e.printStackTrace();
    }
    champions.forEach(champion -> {
        System.out.println(champion.toString());
    });
}
}

This is my full WebParser class if anyone is interested, im sure there is a better way or a more efficient way to write this but this is what has worked for me as of now!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.