SQL Left Join unindexed table and get one match only for better performance

54 views Asked by At

I have table A and table B

Table A

date       | id | content
--------------------------
2023-12-02 | 1  | a 
....

Table B

date       | id | size
--------------------------
2023-12-02 | 1  | 10  
2023-12-02 | 1  | 11 
....

I want table A to left join table B on id, and return the value of size in any row. So both of the following are acceptable

date       | id | content | size
------------------------------------
2023-12-02 | 1  | a       | 10  

or

date       | id | content | size
------------------------------------
2023-12-02 | 1  | a       | 11

My question is that, while A is partitioned on date and uses id as index, b is partitioned on date and without any index. So joining with table b is slow if I use a naive left join.

SELECT A.*, B.size 
FROM A
LEFT JOIN B ON A.id = b.id
WHERE A.date = '2023-12-02'
  AND B.date = '2023-12-02'

Is there anyway I can write the query so it will execute faster?

1

There are 1 answers

0
T N On

Lacking a specific DBMS reference in your question, I am going to answer for SQL server. This answer might be applicable to other DBMS's.

To select the best matching row from another table, you can use the CROSS APPLY (SELECT TOP 1 ... ORDER BY ...) pattern. In your case, since you don't care about which record is selected, you can drop the order-by.

A CROSS APPLY is like a join to a subselect. If you want to allow for a no-matching-records case, you can use an OUTER APPLY which is like a left-join.

Your query would then become:

SELECT A.*, B1.size
FROM A
OUTER APPLY (
    SELECT TOP 1 *
    FROM B
    WHERE B.id = A.id
    AND B.date = '2023-12-02'  -- Or possibly B.date = A.date
    -- ORDER BY intentionally omitted
) B1
WHERE A.date = '2023-12-02'

Of course this probably still depends on having an index (or partition/index combination) on B(date, id). The obvious fix is to add a supporting index. (You didn't mention why you haven't done so.)

If you have many rows in A and are looking up just as many corresponding rows in B, you could preprocess the contents of B to eliminate all but one row per id/date combination. This could be done with a subselect or common table expression (CTE) that uses the ROW_NUMBER() window function, removes excess rows, and then feeds the main join.

Something like:

SELECT A.*, B1.size
FROM A
LEFT JOIN (
    SELECT BN.*
    FROM (
        SELECT
            *,
            ROW_NUMBER() OVER(PARTITION BY B.id, B.date /*Order-by omitted*/) AS RowNum
        FROM B
    ) BN
    WHERE BN.RowNum = 1
) B1
    ON B1.id = A.id
    AND B1.date = '2023-12-02'  -- Or possibly B.date = A.date
WHERE A.date = '2023-12-02'

Feeding the subselect contents into a #temp table and adding an index on that #temp table may also be an option.