I need to improve performance a desktop application (.net) which was designed to read the database and create xml files based on XBRL (eXtensible Bussiness Reporting Language). It is using UBMatrix for creating XBRL Taxonomies.
The application works fine if the size of particular data is small. But the application will take more than 30 min to generate files if the data is big. The client data is always huge/big. So the application requires more time to generate files.
My task is to optimise the application in order to reduce the time taken to create the xml files. When I checked the application I found that the application is running on this way.
Starts
- Create connection to db
- gets the first set of data ( this table (table1) is too Large ). And the query will returns around 15-30 K rows to dataTable
- for loop 0 to datatable.Rows.count
- checks some condition
- get data from db. (this table (table2) is also too large than (table1).
- send data to form xbrl and writes to xml ( this is done by thrid party application called UBMatrix). It is not possible to edit the code which creates xbrl-xml file.
Similarly there are 3 to 4 set of data will process
In my observation, we can avoid db calls in for loop. Get all the data before the loop. When I checked the queries, there were subqueries,not exists(select * from table) etc can be replaced with joins, not exists (select 1 from table)
But still the application need to process in loop. I am also thinking of using threading so that the I can create threads based on the size of data and process it simultaneosly.
Eg
- if there are 100 rows.there will be 100 entries to xml file (XBRL)
- So i will make 50,50 and run in two threads which will generate two xml file. at the end I will combine two into one xml file.
So processing of 0th question and 50th question can be start at same time. Currently in for loop, 0th will process and 99th will be process at the end only. I am not sure about the idea. Can any suggest /share your ideas . any help will be appreciated. Thanks in advance
30k queries in 30 minutes is just 16 queries per second. That is not very much unless the queries are expensive.
To find out, run SQL Profiler and check the execution time of each query. Multiply with the number of queries. If that is reasonably close to 30 minutes you are lucky if you can rewrite all those queries to a join and put the result in a
Dictionary
orILookup
.If you need to resort to multi threading. Check if you have the possibility to upgrade to .NET 4. Then you can use
Parallel.ForEach
or some other suitable method in TPL to parallelize your work.