How to find which column causes row duplication in JOIN statement?

2.2k views Asked by At

MS SQL Server. I have this kind of situation:

SELECT
    x1,
    x2,
    x3,
    x4,
    x5,
    x6,
    x7,
    x8,
    x9,
    x10,
    x11,
    x12,
    x13
FROM
    T1  -- this has a lot of columns
JOIN
    T2, -- this has a lot of columns
    T3, -- this has a lot columns as well
    T4, -- same here and other tables
    T5,
    T6,
    T7,
    T8,
    T9,
    T10,
    T11,
    T12,
    T13,
    T14

And I have some duplicated rows, so the column which contains different values is not among those 13 selected columns. I want to resolve the issue by finding which column in joined tables has varying value and then decide what to do with it.

The solution is to select T1.*, T2.*, etc. and carefully analyse results. The thing is, I'd have to select a lot of columns and spend time to do a repetitive task.

Is there a tool/procedure/query which would do it automatically for me? I believe it's a common task to do when developing complex database queries.

EDIT

I've found an excellent tool which can simplify analysis of a select * query results.

Dbeaver, a free universal database manager, has a "Calc" panel in results view. One can select whatever cells/rows/columns and it calculates count and distinct count of values on the fly. Those can be grouped by columns which is exactly what is needed when searching which column has distinct values. Super useful.

Reference: https://github.com/dbeaver/dbeaver/wiki/Panels

1

There are 1 answers

3
Ashutosh A On

One option is this -

SELECT
    x1, count (DISTINCT T1.*),
    x2, count (DISTINCT T2.*),
    x3, count (DISTINCT T3.*)
FROM
    T1  -- this has a lot of columns
JOIN
    T2, -- this has a lot of columns
    T3
GROUP BY
    x1,
    x2,
    x3

This will tell you which tables have repeating rows (the ones with count > 1)... you can then analyze the join condition for those tables