HTTPS Get Requests with TCP Client dotnet

376 views Asked by At

I want to receive the html of a webpage given it's address

var url = "https://www.stackoverflow.com/questions"
var uri = new Uri(url);
var host = uri.Host;
client.Connect(host, 443);
using SslStream sslStream = new SslStream(client.GetStream(), 
    false,
    new RemoteCertificateValidationCallback(ValidateServerCertificate), 
    null
);

var message = @$"GET {uri.AbsolutePath} HTTP/1.1
Accept: text / html, charset = utf - 8
Connection: close
Host: {host}
" + "\r\n\r\n";
sslStream.AuthenticateAsClient(host);
using var reader = new StreamReader(sslStream, Encoding.UTF8);
byte[] bytes = Encoding.UTF8.GetBytes(message);
sslStream.Write(bytes, 0, bytes.Length);
var response = reader.ReadToEnd();

public static bool ValidateServerCertificate(
    object sender, 
    X509Certificate certificate,
    X509Chain chain, 
    SslPolicyErrors sslPolicyErrors)
{
    return true;
}

This code is very iconsistent, I can receive 302,301,403,200
I would like to understand what is causing this inconsistency and how it could be fixed.

1

There are 1 answers

4
IOEnthusiast On
var message = @$"GET {uri.AbsolutePath} HTTP/1.1
Accept: text/html, charset=utf-8
Connection: close
User-Agent: C# program
Host: {host}
" + "\r\n\r\n";

User-Agent was required for websites like facebook and instagram that would throw, 302 unsupported browser.

301 - was because not every website has the www subdomain

403/401 - was the most obvious, some resources just aren't available, if you're not authenticated.