The page is that

http://www.centerplex.com.br/

My Method

public String getHtml(String urlStr, String charset) throws Exception {
    System.setProperty("http.proxyHost", "XXX.XX.X.XXX");
    System.setProperty("http.proxyPort", "XXXX");
    URL url = new URL(urlStr);
    URLConnection conn = url.openConnection();
    InputStream is = url.openStream();
    InputStreamReader isr = new InputStreamReader(is, charset);
    BufferedReader br = new BufferedReader(isr);
    String linha = br.readLine();
    String html = "";
    while (linha != null) {
        System.out.println("" + linha);
        html += linha;
        linha = br.readLine();
    }

    return html;
}

This method work well to other pages, but in that particular page is give me a incomplete HTML.

I see a lot of javascript through the page, but I don't know if it has influence

Here is the html return form my metho from this page

<!doctype html>
<html>
    <head>
        <title>Centerplex Cinemas</title>
        <meta charset="iso-8859-1">
        <meta name="description" content="">
        <meta name="keywords" content="">
        <meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0;">
        <link href="apple-touch-icon.png" rel="apple-touch-icon" type="image/png">
        <link href="lib/css/estilo.css" rel="stylesheet" type="text/css">
    </head>
    <body>


                <div class="tematizacao">
                    <iframe src="//www.youtube.com/embed/" class="trailer" frameborder="0" allowfullscreen></iframe>
                    <img src="http://www.centerplex.com.br/fotos/wallpaper_mobile/470.jpg" />
                </div>



           <div class="header">

     <h1><a href="index.php" title="Centerplex">Centerplex</a></h1>

   </div>        <div class="efilme">
            <a href="http://www.centerplex.com.br/mobile/filme.php?cf=5807" title="Kung Fu Panda 3"><img src="http://www.centerplex.com.br/fotos/hp_mobile/188.jpg" title="Kung Fu Panda 3" alt="Kung Fu Panda 3" width="100%"></a>
                    </div>
        <ul class="nav">
            <li><a href="lancamentos.php" title="Estreias / Em Cartaz">Estreias / Em Cartaz</a></li>
            <li><a href="salas-horarios.php" title="Salas & Horários">Salas & Horários</a></li>
                    </ul>
           <ul class="fnav">

      <li><a href="breve.php" title="Em Breve" class="breve">Em Breve</a></li>

      <li><a href="promocoes.php" title="Promoções" class="promo">Promoções</a></li>

      <li><a href="corporativo.php" title="Corporativo" class="corp">Corporativo</a></li>

      <li class="nbr"><a href="faleconosco.php" title="Fale Conosco" class="fale">Fale Conosco</a></li>

   </ul>           <div class="footer">
     <p>©Centerplex 2016</p>
   </div>
<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-3269539-1', 'auto');
  ga('send', 'pageview');

</script>

    </body>
</html>

1 Answers

0
Xupypr MV On Best Solutions

There is no bugs in your code. Looks like server-side return different content, according to request. Try to make request via HttpClient library and imitate browser request:

import java.io.IOException;
import org.apache.commons.io.IOUtils;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.HttpClientBuilder;

public class NewClass {
    public static void main(String[] args) throws IOException {
            String HOST = "www.centerplex.com.br";
            HttpPost post = new HttpPost( "http://"+HOST+"/" );
            post.setHeader("ProtocolVersion ", "HTTP/1.1");
            post.setHeader("Host",HOST);
            post.setHeader("Connection","keep-alive");
            post.setHeader("Accept", "*/*");
            post.setHeader("User-Agent","Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36");
            post.setHeader("DNT","1");
            post.setHeader("Accept-Encoding","gzip, deflate, sdch");
            post.setHeader("Accept-Language","en-GB,en-U3;q=0.8,en;q=0.6");
            post.setHeader("Cookie","_gat=l; _ga=GAl.2.904730494.1449539712");
            post.setHeader("HeaderEnd","CRLF");
            CloseableHttpResponse response = HttpClientBuilder.create().build().execute(post);
            String responseText = IOUtils.toString( response.getEntity().getContent(), "UTF-8" );
            System.out.println(responseText);
    }
}