UniData UniQuery - two WITH

2.1k views Asked by At

Alright I have little to no knowledge of SQL language, and am wondering what are the possible reasons for the slowness of two WITH vs one WITH in unidata.

Database has around ~1 million rows.

Ie/

SELECT somewhere WITH Column1 = "str" AND WITH Column2 = "Int" 5< minutes

Compared to

SELECT somewhere WITH Column1 = "str" ~1 second

somewhere is indexed (from my knowledge)

so is there anything I'm doing wrong?

If more information is required just ask, not sure what to supply.

Also whats the difference between WITH and WHERE?

2

There are 2 answers

1
Dan McGrath On

This isn't SQL, it is UniQuery.

To clarify it for you, you can't index the file (somewhere, in this case), only the columns of the file. You might find Column1 is indexed and Column2 is not. Type in LIST.INDEX somewhere to find out what columns have been indexed.

For your question, you have only compared selecting on Column1 against selecting on Column1 & Column2 and assumed the vastly slower response is purely because you selected on 2 columns. Your next text should have been to select only on Column2 and seen how slow that was.

There are are many possible reasons to explain the difference in response, aside from indexing. In UniData columns are defined as 'dictionary items' There are different types of dictionary items. The most basic is a D-type dictionary item which is just a direct reference to a field in the record. Another type is the I or V-type, which is a derived field. The derived field can be as simple as returning a constant or as complex as performing an equivalent performing a JOIN with another file and/or some form of complex calculation. This this is should be simple to see that different columns can take vastly different amounts of processing to handle.

Other reasons are how deep in the record the column is (first field references will be faster than fields later in the record) as well as potential query caching that can affect the timings of your SELECTs.

For more information, check out the database's manuals at Rocket Software.

0
DaveB On

A single column SELECT on an indexed field will not even require that any data file records are read. If you look under the hood, you'll see that the index file is a normal hash file, and the single column SELECT will simply mean that the index file record with the key "str" is read. This could return thousands and thousands of keys in less than a second.

Once you add the second column, you are probably forcing the system to read all of those thousands and thousands of records, EVEN IF THE SECOND COLUMN IS INDEXED. This is going to take a measurable amount of more time.

In general, an index on a field with a small number of unique values is of dubious use. If the second column contains data that has a large number of possible values, leading to a smaller number of records with each particular index value, then it would be best to arrange the SELECT such that the index used is on the second column. I'm not sure, but it might be possible to simply reverse the order of the columns in the SELECT statement to do this. Otherwise you might need to run two SELECT statements back to back.

As an example, assume that the file has 600,000 records with Column1 = "str", and 2,000 records with Column2 = "int":

>SELECT somewhere WITH Column2 = "int"
>>SELECT somewhere with Column1 = "str"

Will read 2,000 records and should return almost instantly.

If the combination of Column1 and Column2 is something that you'll be SELECTing on frequently, then you might want to create a new dictionary item that combines the two, and build an index on that.

That being said, it shouldn't take a U2 system 5 minutes to run through a file of a million records. There's a very good chance that the file has become badly overflowed, and needs to be resized with a larger modulo to improve performance.