Querying Intersystem Caché through ODBC

3.2k views Asked by At

I'm querying Caché for a list of tables in two schemas and looping through those tables to obtain a count on the tables. However, this is incredibly slow. For instance, 13 million records took 8 hours to return results. When I query an Oracle database with 13 million records (on the same network), it takes 1.1 seconds to return results.

I'm using a BackgroundWorker to carry out the work apart from the UI (Windows Form).

Here's the code I'm using with the Caché ODBC driver:

using (OdbcConnection odbcCon = new OdbcConnection(strConnection))
{
    try
    {
        odbcCon.Open();
        OdbcCommand odbcCmd = new OdbcCommand();

        foreach (var item in lstSchema)
        {                    
            var item = i;
            odbcCmd.CommandText = "SELECT Count(*) FROM " + item;
            odbcCmd.Connection = odbcCon;
            AppendTextBox(item + " Count = " + Convert.ToInt32(odbcCmd.ExecuteScalar()) + "\r\n");
            int intPercentComplete = (int)((float)(lstSchema.IndexOf(item) + 1) / (float)intTotalTables * 100);
            worker.ReportProgress(intPercentComplete);
            ModifyLabel(" (" + (lstSchema.IndexOf(item) + 1) + " out of " + intTotalTables + " processed)");
        }
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.ToString());
        return;
    }
}

Is the driver the issue?

Thanks.

3

There are 3 answers

1
Alex Roberts On

Cache has a built in (smart) function that determines how to best execute queries. Of course having indexes, especially bitmapped, will drastically help query times. Though, a mere 13 million rows should take seconds tops. How much data is in each row? We have 260 million rows in many tables and 790 million rows in others. We can mow through the whole thing in a couple of minutes. A non-indexed, complex query may take a day, though that is understandable. Take a look at what's locking your globals. We have also discovered that apparently queries run even if the client is disconnected. You can kill the task with the management portal, but the system doesn't seem to like doing more than one ODBC query at once with larger queries because it takes gigs of temp data to do such a query. We use DBVisualizer for a JDBC connection.

Someone mentioned TuneTable, that's great to run if your table changes a lot or at least a couple of times in the table's life. This is NOT something that you want to overuse. http://docs.intersystems.com/ens20151/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_optimizing is where you can find some documentation and other useful information about this and improving performance. If it's not fast then someone broke it.

Someone also mentioned that select count() will count an index instead of the table itself with computed properties. This is related to that decision engine that compiles your sql queries and decides what the most efficient method is to get your data. There is a tool in the portal that will show you how long it takes and will show you the other methods (that the smart interpreter [I forget what it's called]) that are available. You can see the Query Plan at the same page that you can execute SQL in the browser mentioned below. /csp/sys/exp/UtilSqlQueryShowPlan.csp

RE: I can't run this query from within the Management Portal because the tables are only made available from within an application and/or ODBC.

That isn't actually true. Within the management portal, go to System Explorer, SQL, then Execute SQL Statements. Please note that you must have adequate privileges to see this %ALL will allow access to everything. Also, you can run SQL queries natively in TERMINAL by executing.... do $system.SQL.Shell() Then type your queries. This interface should be faster than ODBC as I think it uses object access. Also, keep in mind that embedded SQL and object access of data is the fastest way to access data.

Please let me know if you have any more questions!

4
Stephen Canzano On

I supose the devil is in the details. Your code does

SELECT COUNT(*) FROM Table

If the table has no indices then I wouldn't be surprised that it is slower than you expect. If the table has indices, especially bitmap indices, I would expect this to be on par with Oracle.

The other thing to consider is to understand how Cache is configured, ie what are the global buffers, what does the performance of the disk look like.

0
Andrew On

Intersystems cache is slower for querying than any SQL database I have used, especially when you deal with large databases. Now add an ODBC overhead to the picture and you will achieve even worse performance.

Some level of performance can be achieved through use of bitmap indexes, but often the only way to get good performance is to create more data.

You might even find that you can allocate more memory for the database (but that never seemed to do much for me)

For example every time you add new data force the database to increment a number somewhere for your count (or even multiple entries for the purpose of grouping). Then you can have performance at a reasonable level.

I wrote a little Intersystems performance test post on my blog...

http://tesmond.blogspot.co.uk/2013/09/intersystems-cache-performance-woe-is-me.html