multiple webrequest through a string array

730 views Asked by At

I have over 2000 url calls to make and with the code below it is taking almost 2 minutes to complete. Could someone help me to speed the process up?

private void button4_Click(object sender, EventArgs e)
    {
        WebRequest req;
        WebResponse res;
        string[] lines = File.ReadAllLines(@"c:\data\temp.txt");
        for (int i = 0; i < lines.Count(); i++)
        {
            req = WebRequest.Create(lines[i]); 
            res = req.GetResponse();
            StreamReader rd = new StreamReader(res.GetResponseStream(), Encoding.ASCII);
            rd.Close();
            res.Close();
            textBox1.Text += ".";
        }
    } 

Many thanks

3

There are 3 answers

0
Panagiotis Kanavos On

Since you don't specify a framework version I'll assume you are using at least 4.5.

You can use ActionBlock to easily execute multiple calls concurrently. An ActionBlock executes its action method in a single thread and multiple executions can be performed concurrently.

You could use something like this:

var options=new ExecutionDataflowBlockOptions
{
    MaxDegreeOfParallelism = 10
}

var block=new ActionBlock<string>(url=>
{
    using(var req = WebRequest.Create(url))
    using(var res = req.GetResponse())
    {
       //Process the response here   
    }
});

string[] lines = File.ReadAllLines(@"c:\data\temp.txt");
foreach(var line in lines)
{
    block.Post(line);
}

block.Complete();

await block.Completion;

You can control how many requests are made concurrently by changing the MaxDegreeOfParallelism method.

You can also call GetResponseAsync to execute the request asynchronously. This won't make them go faster but it will reduce the number of ThreadPool threads used to serve the same number of requests. This means that less CPU is wasted while blocking and context switching.

var block=new ActionBlock<string>(url=>async 
{
    using(var req = WebRequest.Create(url))
    using(var res = await req.GetResponseAsync())
    {
       //Process the response here   
    }
});

Disposing requests and responses is important. Unless you dispose the response, the connection to the server remains active. .NET enforces a 2 concurrent request per domain (ie URL) limit so orphaned responses can cause delays until the garbage collector runs and collects them. While you can override the limit, it's best to always dispose of the responses.

1
Adriano Repetti On

You can't speed-up things much because bottleneck is your Internet connection. However there is something you can do:

1) Do not LINQ count lines, it's an array and its size is known (micro optimization, you won't ever notice this change).

2) Use using to release disposable objects (nothing to do with speed, better error handling: if something went wrong with your code you'll release resources with GC).

3) Make them parallel. This will speed-up things little bit:

private void button4_Click(object sender, EventArgs e)
{
    var lines = File.ReadAllLines(@"c:\data\temp.txt");

    var options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
    Parallel.ForEach(lines, options, line => 
    {
        var request = WebRequest.Create(line);

        using (var response = request.GetResponse()) 
        {
            var reader = new StreamReader(response.GetResponseStream(), Encoding.ASCII);

            // Do your stuff

            BeginInvoke(new MethodInvoker(delegate 
            {
                textBox1.Text += ".";
            }));
        }
    });
} 

Few more notes:

  • MaxDegreeOfParallelism sets maximum number of concurrent requests. Multiple active concurrent connections won't speed-up things indefinitely and they may even slow things down. Some trials will help you to set this value to a reasonable value.

  • There is not any error checking but network things may temporary go wrong but after a short delay they may work as expected. I suggest to also read System.Net.WebException: The remote name could not be resolved and this for I/O operations.

To make it a more complete example, your click even handler will be:

private void button4_Click(object sender, EventArgs e)
{
    var options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
    Parallel.ForEach(ReadUrlList(@"c:\data\temp.txt"), options, ProcessUrl);
}

Actual code to process each URL and to read URL list:

private static string[] ReadUrlList(string path)
{
    return File.ReadAllLines(@"c:\data\temp.txt");
}

private void ProcessUrl(string url)
{
    ProcessResponse(response =>
    {
        using (var reader = new StreamReader(response.GetResponseStream(), Encoding.ASCII))
       {
            // Do your stuff

            // We're working on separate threads, to access UI we
            // have to dispatch the call to UI thread. Note that
            // code will be executed asynchronously then local
            // objects may have been disposed!
            BeginInvoke(new MethodInvoker(delegate 
            {
                textBox1.Text += ".";
            }));
        }
    });
} 

With this helper method to hide try/wait pattern for network operations:

private static void ProcessResponse(string url, Action<WebResponse> action) 
{
    for (int i=1; i <= NumberOfRetries; ++i) 
    {
        try 
        {
            var request = WebRequest.Create(line);

            using (var response = request.GetResponse()) 
            {
                action(response);
            }

            break;
        }
        catch (Exception e) 
        {
            if (i == NumberOfRetries)
                throw;

            Thread.Sleep(DelayOnRetry);
        }
    }
}

private const int NumberOfRetries = 3;
private const int DelayOnRetry = 1000;
0
Enigmativity On

I'm going to suggest that you use Microsoft's Reactive Framework for this. NuGet "Rx-Main", "Rx-WinForms"/"Rx-WPF".

Here's what the code would look like:

private void button4_Click(object sender, EventArgs e)
{
    var query =
        from line in File.ReadAllLines(@"c:\data\temp.txt").ToObservable()
        from result in Observable.Defer(() =>
        {
            var req = WebRequest.Create(line);
            return
                Observable.Using(
                    () => req.GetResponse(),
                    res => Observable.Using(
                        () => new StreamReader(res.GetResponseStream(), Encoding.ASCII),
                        st => Observable.Start(() => st.ReadToEnd())));
        })
        select new { line, result };

    query
        .ObserveOn(textBox1)
        .Subscribe(x => textBox1.Text += ".");
}

I have assumed that you are trying to read a string from the stream.

This code nicely disposes of all intermediate objects. It also correctly multithreads the requests and it the marshalls the results to the UI thread and updates the text box text.

A slightly cleaner version of this code is this:

private void button4_Click(object sender, EventArgs e)
{
    var query =
        from line in File.ReadAllLines(@"c:\data\temp.txt").ToObservable()
        from result in Observable.Using(
            () => new WebClient(),
            wc => Observable.Start(() => wc.DownloadString(new Uri(line))))
        select new { line, result };

    query
        .ObserveOn(textBox1)
        .Subscribe(x => textBox1.Text += ".");
}

It uses WebClient for the download. It still multithreads as required.