C# - Using ThreadPool to call Webclient.DownloadFileAsync multiple times

384 views Asked by At

Hi I'm new to Multithreading- and I'm struggling to download multiple files from web using DownloadFileAsync. There are about 400 files to be download and I prepared the URLs to send request using WebClient class. I called the DownloadfileAsync using threadpool hoping that it will be faster than serial download. Url that I used will look like this with item number change for each url(104, 105 etc).

http://medicarestatistics.humanservices.gov.au/statistics/do.jsp?_PROGRAM=%2Fstatistics%2Fmbs_item_standard_report&DRILL=ag&group=104&VAR=services&STAT=count&RPT_FMT=by+state&PTYPE=month&START_DT=202101&END_DT=202101

And my code looks like below:

        foreach(var d in infolist)
        {
            string itemtype = d.Key;
            Dictionary<string, string> folderAndurl = d.Value;
            foreach (var itemcode in itemcodes)
            {
                foreach (var date in dates)
                {
                        filename = folderAndurl["folder"] + date + "_" + itemcode + ".xls";
                        url = folderAndurl["url"].Replace("XXX", itemcode).Replace("STDATE", date);

                    ThreadPool.UnsafeQueueUserWorkItem(new WaitCallback(DownloadWebAsync), new object[] { filename, url });
                    //ThreadPool.QueueUserWorkItem(new WaitCallback(DownloadWebAsync), new object[] { filename, url });
                }
            }
        }

And DownloadWebAsync as below: private void DownloadWebAsync(object state) { object[] list = state as object[]; string filename = Convert.ToString(list[0]); string url = Convert.ToString(list1);

        WebClient client = new WebClient();
        Uri uri = new Uri(url);
        client.DownloadFileCompleted += new AsyncCompletedEventHandler(Client_DownloadFileCompleted);
        client.QueryString.Add("file", filename); 
        client.QueryString.Add("url", url); 
        client.DownloadFileAsync(uri, filename);

        //throw new NotImplementedException();
    }

When the ThreadPool started I can see that Multiple BLANK Files are created straight away on disk as shown in image below. They all have 0 KB in size to start with I'm assuming all the threads in ThreadPool are being run and sending the requests to website.

screenshot

However it appears to me that files on disk are updated with downloaded data return from request 1 at a time or maximum 2 at the time(mostly 1 at a time). My expectation is update to happen simultaneously to those 0KB files - at lease 3 or 4 files should be processing at point of time as threads that call DownloadFileAsync are already running? I have no idea if I'm doing anything wrong here with code or any property need to set. My expectation is to have simultaneous download to improve download time but this is not happening right now.

Another reason I'm using treadpool is that I'm writing the status/url/download size back to UI window and I don't want UI to be unresponsive during 400 files download.

I'm also testing with Thread, TreadPool, Task Parallel Library and also using Webclient, HttpClient(async/await) etc but in all of cases, it appears that after thread or tasks are started it created blank files straight away - but actual download happens one at the time. Also tested with WebClient.DownloadFile and Timeout error occur running through threadpool so I will have to use Async.

Could someone please help me to explain if this is expected behaivour or how can I improve the download experience? I have been struggling with this for nearly a week and your help is greatly appreciated.

Regards

0

There are 0 answers