C# httpwebrequest blank chars in responsestream

897 views Asked by At

I'm trying to read the reponse from a webserver using httpwebrequests in C#. I use the following code:

UriBuilder urib = new UriBuilder();
urib.Host = "wikipedia.com";

HttpWebRequest req = WebRequest.CreateHttp(urib.Uri);
req.KeepAlive = false;
req.Host = "wikipedia.com/";
req.Method = "GET";

HttpWebResponse response = (HttpWebResponse) req.GetResponse();
byte[] buffer = new byte[response.ContentLength];
System.IO.Stream stream = response.GetResponseStream();
stream.Read(buffer, 0, buffer.Length);

Console.WriteLine(System.Text.Encoding.ASCII.GetString(buffer, 0, buffer.Length));

The code does indeed retrieve the correct amount of data (I compared the contentlength used to create the buffer, with the length of the console output, they're the same. My problem is that the last 80% or so of the response is blank chars. They're all 0x00. I tested this with several pages, including wikipedia.com and it just cuts off mid-file for some reason.

Have I misunderstood/misused the way to use webrequests or can anyone spot an error here?

2

There are 2 answers

1
Danilo Vulović On BEST ANSWER

Try to use this method:

public static String GetResponseString(Uri url, CookieContainer cc)
{
    HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
    request.Method = WebRequestMethods.Http.Get;
    request.CookieContainer = cc;
    request.AutomaticDecompression = DecompressionMethods.GZip;

    HttpWebResponse response = (HttpWebResponse)request.GetResponse();

    StreamReader reader = new StreamReader(response.GetResponseStream());

    String responseString = reader.ReadToEnd();

    response.Close();

    return responseString;
}
0
James On

There are a couple of issues with your code:

  1. Your trying to read the entire response in one go using Stream.Read - that's not what it was designed for. This should be used for more optimal reading e.g. 4KB chunks.

  2. Your reading a HTML response as ASCII encoding - are you sure the page doesn't contain any Unicode characters? I would stick to UTF-8 encoding to be on the safe side (or alternatively read the Content-Type header in the response).

When reading characters from a byte stream (which is what your response is essentially) the recommended approach is to use StreamReader. More specifically, if you want to read the entire stream in one go then use StreamReader.ReadToEnd.

Your code could be shortened to:

HttpWebRequest req = WebRequest.CreateHttp(new Uri("http://wikipedia.org"));
req.Method = WebRequestMethods.Http.Get;
using (var response = (HttpWebResponse)req.GetResponse())
using (var reader = new StreamReader(response.GetResponseStream()))
{
    Console.WriteLine(reader.ReadToEnd());
}