Background:
One of our clients has a system where they have recordings of phone conversations between their agents and their customers, who they create various contracts with. The recordings are stored on a server with their locations saved in the Recordings table in the database. The agents can then "attach" a recording to a contract, which creates an entry in the ContractRecordings table. I have a need to create a report that shows which recordings are not attached to a contract but the way the tables are designed is making this more difficult than expected.
Recording
------------------------------
recordingId : INT PK, IDENTITY
agentId : INT, FK
filename : NVARCHAR(255)
ContractRecording
----------------------------------
recordingId : INT PK, IDENTITY
contractNumber : INT
created : DATETIME
username : NVARCHAR(20)
note : NVARCHAR(MAX)
fileLocation : NVARCHAR(max)
This would be easy if ContractRecording.recordingId was a foreign key reference to Recording.recordingId but it isn't. It's its own identity key. The only link between the tables is the file location but Recording.filename only stores the filename whereas ContractRecording.fileLocation stores the full path. Yeah, I know but I didn't design these tables. Luckily, there is a pattern and the full path is derived from the Agent's name and the date of the recording, both of which we can know from data in the Recording table. But of course, there's another problem: the format for the file path changed a year or so ago and some recordings are stored under the old format and some under the new format.
Old Format: C:\John-Recordings\2015\06\15-0811.wav
New Format: C:\Recordings\John Smith\2015\06\15-0811.wav
The Problem:
In order to link the two tables, I have to join them on the full path of the recordings, which must be constructed manually on the Recording table and can be in one of two formats. I originally tried using an OR in the JOIN clause but that takes around 8 minutes to return back around 15k rows, which is not acceptable. I then tried using two LEFT OUTER JOINs - one for each condition - but that took ten minutes to pull what seemed to be the same data. I imagine that's because I'm joining on a custom field, which isn't indexed. Splitting it into two SELECTs and using a UNION resulted in duplicate rows each query would return one row for every recording. Do I have any other options to get this query down to under a few seconds? Here's my original query using the OR clause.
SELECT * FROM
(SELECT
cr.recordingid AS "attachedrecordingid"
,rec.recordingid AS "rawrecordingid"
,cr.contractnumber
,cr.created
,rec.name
,cr.note
,cr.filelocation
,rec.filename
,rec.recordtime
FROM ContractRecording cr
RIGHT OUTER JOIN
(SELECT
recordingid
,a.name
,filename
,retain
,r.recordtime
,'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathnew"
,'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathold"
FROM Recording r
JOIN Agents a
ON r.agentid = a.agentid) rec
ON cr.filelocation = rec.fullpathold OR cr.filelocation = rec.fullpathnew) main
ORDER BY main.name, main.recordtime
The report needs to show one row for all records in the Recording table (unless a single recording is attached to multiple contracts, in which case it should show one row per pairing), with data from ContractRecording displayed if there are any rows that match on either of the filelocation formats.
If absolutely necessary, I'm not opposed to just pulling all the data from both tables and linking them through code but that would be a last resort.
UPDATE:
As requested, here's the UNION version of the query for analysis. As mentioned, it returns two rows for every pairing - one with data and one without. This is because at least one of the two JOINs is always going to have no matches but I only want to ignore those when the other JOIN does have a match. If neither JOIN matches, I also only want to display it once. I'm less confident I can achieve my desired result with a UNION than I am with other possibilities so I haven't pursued this approach.
SELECT * FROM
((SELECT
cr.recordingid AS "attachedrecordingid"
,rec.recordingid AS "rawrecordingid"
,cr.contractnumber
,cr.created
,rec.name
,cr.note
,cr.filelocation
,rec.filename
,rec.recordtime
FROM ContractRecording cr
RIGHT OUTER JOIN
(SELECT
recordingid
,a.name
,filename
,retain
,r.recordtime
,'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathold"
FROM Recording r
JOIN Agents a
ON r.agentid = a.agentid) rec
ON cr.filelocation = rec.fullpathold)
UNION
(SELECT
cr.recordingid AS "attachedrecordingid"
,rec.recordingid AS "rawrecordingid"
,cr.contractnumber
,cr.created
,rec.name
,cr.note
,cr.filelocation
,rec.filename
,rec.recordtime
FROM ContractRecording cr
RIGHT OUTER JOIN
(SELECT
recordingid
,a.name
,filename
,retain
,r.recordtime
,'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathnew"
FROM Recording r
JOIN Agents a
ON r.agentid = a.agentid) rec
ON cr.filelocation = rec.fullpathnew)) main
ORDER BY main.name, main.recordtime
You can try using LIKE
If this helps any.. I would also try creating a temp table instead of using a cte and see if that helps more.
You can also try splitting the two OR statements into 2 cte's and using a union to combine the recording id's that were found
Your
UNION
needs to be in a sub select and then you can left join to that subquery