The same divs for several references and duplicate results (Jsoup)

86 views Asked by At

Today, faced with the problem, using the JSoup library. The website contains the required data, but there are the same divs that without classes, but with 'style'. Need to get the number of the width.

<div style="height: 12px; width: 196px; background-color: #5C1; float: left; border-right: 1px solid #111;"></div>

There are two cases: 1. The text appears, but with one div; 2. The text appears on all divs, but making a lot of duplicates and is considered as a new line.

try{
Document doc = Jsoup.connect("http://www.lolking.net/summoner/euw/34201718").get(); //Random player
        Elements elem = doc.getElementsByTag("div");
        Scanner scn = new Scanner(elem.toString());
        while(scn.hasNext()){
            String res = scn.nextLine();
            if(res.contains("<div style=\"height: 12px; width: ") && res.contains("px; background-color: #5C1; float: left; border-right: 1px solid #111;\"></div>")){
                if(sd == 0){ //So flooding was not. I understand that this can be a problem, but if you remove, there will be a flood of other numbers (duplicates) that cannot be removed, because each line of the program perceives as one
                    String t1 = res.replace("                 <div style=\"height: 12px; width: ", "");
                    String t2 = t1.replace("px; background-color: #5C1; float: left; border-right: 1px solid #111;\"></div> ", "");
                    System.out.println(t2); // Get 192... and all
                    sd += 1;
                }
            }
        }
    }
    catch(IOException e){
        e.printStackTrace();
    }

All day can't come up with have experienced a lot of solutions, but always came to these two cases. Recently started to learn Java. Thanks.

1

There are 1 answers

5
nafas On

If I understood you correctly, you need all the width within the style from div tag.

you certainly can use Jsoup to do it:

Edited code:

    Set<String> known=new HashSet<String>();
    known.add("height: 12px");
    known.add("background-color: #5C1");
    known.add("float: left");
    known.add("border-right: 1px solid #111");
    Document doc = Jsoup.connect("http://www.lolking.net/summoner/euw/34201718").get();
    Elements elements=doc.select("div");
    for(Element e : elements){
        if(e.hasAttr("style")){
            Set<String> splitted=new HashSet<String>();
            for(String s : Arrays.asList(e.attr("style").split(";"))){
                splitted.add(s.trim());
            }
            if(splitted.containsAll(known)){
                splitted.removeAll(known);
                for(String s: splitted){
                    if(s.startsWith("width:")){
                        System.out.println(s);
                    }
                }
            }
        }
    }