(SQL2014 if that makes a difference)
Say I have [tableA]
id ResultID SampleID ERRORCODE col4 colN
1 9001 1100 0 ... ...
2 9002 1100 100 ... ...
3 9003 1100 200 ... ...
4 9004 1100 300 ... ...
5 9005 1101 0 ... ...
6 9006 1101 0 ... ...
7 9007 1101 0 ... ...
8 9008 1101 0 ... ...
9 9009 1102 0 ... ...
10 9010 1102 100 ... ...
11 9011 1102 200 ... ...
12 9012 1102 0 ... ...
and I want to produce a result that only shows the sample sets (identified by common SampleID
vals) that contain all of errorcodes 0, 100, 200 & 300. i.e. the above would reduce to:
id ResultID SampleID ERRORCODE col4 colN
1 9001 1100 0 ... ...
2 9002 1100 100 ... ...
3 9003 1100 200 ... ...
4 9004 1100 300 ... ...
So I'm needing a query that looks at multiple rows at one time and downselects groups that have (i) certain key values in a column, here thats [ERRORCODE]
and (ii) consistent values in another column, here thats [SampleID]
. I've looked at:
Query with multiple IN clause on multiple rows
But didn't have any joy. The subquery that worked for oliboon is only working on one row for me. Olga's code didn't work at all and Aushin's produced unexpected results (and removing half the syntax didn't change them)!
N00b to SQL, so I'm a bit lost!
The "table" keyword used in a few of those solutions listed doesn't seem to work for me - and it may be that those answers were intent on creating a subtable which was then further parsed in subqueries for their solution. All I get is an "incorrect syntax near the keyword table" error if I try anything like:
select distinct SampleID from table [my].[db].[path].[tableA]
I thought a query of the format
SELECT *
From [tableA]
where
[SampleID] in (Select [SampleID] from [tableA] where [ERRORCODE] = 0) and
[SampleID] in (Select [SampleID] from [tableA] where [ERRORCODE] = 100)
Would have worked, but it only returns a result if the two [ERRORCODE] checks are for the same code, i.e. 100. Which of course is useless. Its checking line by line rather than doing the first part of the logic gate, then the 2nd.
If I was able to make the first part of the where [ERRORCODE]=0 complete, then downselect from those SampleID's where [ERRORCODE]=100 and repeat, then that'd work. Not sure how to do that though.
edit: Ach FFS. Turns out every single errorcode I was looking must be mutually exclusive with each other - no matter what combination, no two could occur together. I'd assumed that in the size of the DB I had, there would have been a combination somewhere.
I checking my query with errorcodes I can see from an unqualified SELECT * that sit beside each other and did prove it works.
Question is invalid I suppose.
I'm a little lost. If you want samples with all four error codes, then this should do what you want:
This should work, although the question claims that it does not. It is not necessarily going to have the best performance.
I usually recommend aggregation to et the sample ids:
If you don't want duplicates as well as covering all the error codes, then use:
This also may not have the best performance in all cases. But the performance is predictable -- it changes little based on the number of codes you are looking for. And the
having
clause can make this quite versatile.Then, if you want all the original data, you can use
join
,in
, orexists
: