SQL Server - Substitute for OR when JOINing on non-indexed field

70 views Asked by At

Background:

One of our clients has a system where they have recordings of phone conversations between their agents and their customers, who they create various contracts with. The recordings are stored on a server with their locations saved in the Recordings table in the database. The agents can then "attach" a recording to a contract, which creates an entry in the ContractRecordings table. I have a need to create a report that shows which recordings are not attached to a contract but the way the tables are designed is making this more difficult than expected.

Recording
------------------------------
recordingId : INT PK, IDENTITY
agentId : INT, FK
filename : NVARCHAR(255)


ContractRecording
----------------------------------
recordingId : INT PK, IDENTITY
contractNumber : INT
created : DATETIME
username : NVARCHAR(20)
note : NVARCHAR(MAX)
fileLocation : NVARCHAR(max)

This would be easy if ContractRecording.recordingId was a foreign key reference to Recording.recordingId but it isn't. It's its own identity key. The only link between the tables is the file location but Recording.filename only stores the filename whereas ContractRecording.fileLocation stores the full path. Yeah, I know but I didn't design these tables. Luckily, there is a pattern and the full path is derived from the Agent's name and the date of the recording, both of which we can know from data in the Recording table. But of course, there's another problem: the format for the file path changed a year or so ago and some recordings are stored under the old format and some under the new format.

Old Format: C:\John-Recordings\2015\06\15-0811.wav

New Format: C:\Recordings\John Smith\2015\06\15-0811.wav

The Problem:

In order to link the two tables, I have to join them on the full path of the recordings, which must be constructed manually on the Recording table and can be in one of two formats. I originally tried using an OR in the JOIN clause but that takes around 8 minutes to return back around 15k rows, which is not acceptable. I then tried using two LEFT OUTER JOINs - one for each condition - but that took ten minutes to pull what seemed to be the same data. I imagine that's because I'm joining on a custom field, which isn't indexed. Splitting it into two SELECTs and using a UNION resulted in duplicate rows each query would return one row for every recording. Do I have any other options to get this query down to under a few seconds? Here's my original query using the OR clause.

SELECT * FROM
    (SELECT 
        cr.recordingid AS "attachedrecordingid"
        ,rec.recordingid AS "rawrecordingid"
        ,cr.contractnumber
        ,cr.created
        ,rec.name
        ,cr.note
        ,cr.filelocation
        ,rec.filename
        ,rec.recordtime
    FROM ContractRecording cr
    RIGHT OUTER JOIN
    (SELECT 
        recordingid
        ,a.name
        ,filename
        ,retain
        ,r.recordtime
        ,'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathnew"
        ,'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathold"
    FROM Recording r
    JOIN Agents a
        ON r.agentid = a.agentid) rec
    ON cr.filelocation = rec.fullpathold OR cr.filelocation = rec.fullpathnew) main
ORDER BY main.name, main.recordtime

The report needs to show one row for all records in the Recording table (unless a single recording is attached to multiple contracts, in which case it should show one row per pairing), with data from ContractRecording displayed if there are any rows that match on either of the filelocation formats.

If absolutely necessary, I'm not opposed to just pulling all the data from both tables and linking them through code but that would be a last resort.

UPDATE:

As requested, here's the UNION version of the query for analysis. As mentioned, it returns two rows for every pairing - one with data and one without. This is because at least one of the two JOINs is always going to have no matches but I only want to ignore those when the other JOIN does have a match. If neither JOIN matches, I also only want to display it once. I'm less confident I can achieve my desired result with a UNION than I am with other possibilities so I haven't pursued this approach.

SELECT * FROM
    ((SELECT 
        cr.recordingid AS "attachedrecordingid"
        ,rec.recordingid AS "rawrecordingid"
        ,cr.contractnumber
        ,cr.created
        ,rec.name
        ,cr.note
        ,cr.filelocation
        ,rec.filename
        ,rec.recordtime
    FROM ContractRecording cr
    RIGHT OUTER JOIN
    (SELECT 
        recordingid
        ,a.name
        ,filename
        ,retain
        ,r.recordtime
        ,'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathold"
    FROM Recording r
    JOIN Agents a
        ON r.agentid = a.agentid) rec
    ON cr.filelocation = rec.fullpathold)
    UNION
    (SELECT 
        cr.recordingid AS "attachedrecordingid"
        ,rec.recordingid AS "rawrecordingid"
        ,cr.contractnumber
        ,cr.created
        ,rec.name
        ,cr.note
        ,cr.filelocation
        ,rec.filename
        ,rec.recordtime
    FROM ContractRecording cr
    RIGHT OUTER JOIN
    (SELECT 
        recordingid
        ,a.name
        ,filename
        ,retain
        ,r.recordtime
        ,'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathnew"
    FROM Recording r
    JOIN Agents a
        ON r.agentid = a.agentid) rec
    ON cr.filelocation = rec.fullpathnew)) main
ORDER BY main.name, main.recordtime
1

There are 1 answers

2
JamieD77 On BEST ANSWER

You can try using LIKE

WITH AgentRecordings AS
(
    SELECT  
        a.name,
        r.recordingId AS rawrecordingid,
        r.filename,
        r.recordtime,
        CONCAT(
            'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + FILENAME,
            'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename
        ) AS filepaths
    FROM
        Agents a
        JOIN Recording r ON a.agentId = r.agentId
)
SELECT
    cr.recordingid AS "attachedrecordingid"
    ,rec.recordingid AS "rawrecordingid"
    ,cr.contractnumber
    ,cr.created
    ,rec.name
    ,cr.note
    ,cr.filelocation
    ,rec.filename
    ,rec.recordtime
FROM
    AgentRecordings rec
    LEFT JOIN ContractRecording cr ON rec.filepaths LIKE '%' + cr.filelocation + '%'

If this helps any.. I would also try creating a temp table instead of using a cte and see if that helps more.

You can also try splitting the two OR statements into 2 cte's and using a union to combine the recording id's that were found

WITH fullpathnew AS
(
    SELECT  cr.recordingid AS "attachedrecordingid",
            rec.recordingid AS "rawrecordingid",
            cr.contractnumber,
            cr.created,
            cr.note,
            cr.filelocation
    FROM    Agents a
            JOIN Recording r ON a.agentId = r.agentId
            JOIN ContractRecording cr ON cr.filelocation = 'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename
),
fullpathold AS
(
    SELECT  cr.recordingid AS "attachedrecordingid",
            rec.recordingid AS "rawrecordingid",
            cr.contractnumber,
            cr.created,
            cr.note,
            cr.filelocation
    FROM    Agents a
            JOIN Recording r ON a.agentId = r.agentId
            JOIN ContractRecording cr ON cr.filelocation = 'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename
)
combinedCtes AS
(
    SELECT attachedrecordingid, rawrecordingid, contractnumber, created, note, filelocation FROM fullpathnew
    UNION SELECT attachedrecordingid, rawrecordingid, contractnumber, created, note, filelocation FROM fullpathold
)
SELECT  cte.attachedrecordingid
        ,r.recordingid AS "rawrecordingid"
        ,cte.contractnumber
        ,cte.created
        ,a.name
        ,cte.note
        ,cte.filelocation
        ,r.filename
        ,r.recordtime
FROM    Agents a
        JOIN Recording r ON r.agentId = a.agentId
        LEFT JOIN combinedCtes cte ON r.recordingid = cte.rawrecordingid

Your UNION needs to be in a sub select and then you can left join to that subquery

SELECT  j.attachedrecordingid
        ,r.recordingid AS rawrecordingid
        ,j.contractnumber
        ,j.created
        ,a.NAME
        ,j.note
        ,j.filelocation
        ,r.filename      
        ,r.recordtime
FROM    Agents a
        JOIN Recording r ON a.agentId = r.agentId
        LEFT JOIN(
            SELECT  cr.recordingid AS "attachedrecordingid"
                    ,rec.recordingid AS "rawrecordingid"
                    ,cr.contractnumber
                    ,cr.created
                    ,cr.note
                    ,cr.filelocation
            FROM    Agents a 
                    JOIN Recording r
                    JOIN ContractRecording cr 
                        ON cr1.filelocation = 'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename
            UNION
            SELECT  cr.recordingid AS "attachedrecordingid"
                    ,rec.recordingid AS "rawrecordingid"
                    ,cr.contractnumber
                    ,cr.created
                    ,cr.note
                    ,cr.filelocation
            FROM    Agents a 
                    JOIN Recording r
                    JOIN ContractRecording cr 
                        ON cr1.filelocation = 'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename


        ) j ON r.recordingId = j.rawrecordingid
ORDER BY a.name, r.recordtime