Impala Query: Combine multiple COUNT DISTINCT WHERE clauses

Question

1.5k views Asked by SummerEla At 08 June 2015 at 23:02

On impala, I'm trying to count the number of distinct sample id's that contain "101", "102" or "103".

Here is an example of my data:

| sample_id | 
 ___________
| 101-345-5 | 
| 101-345-6 | 
| 101-345-6 | 
| 102-345-5 | 
| 103-345-5 | 
| 103-345-8 | 
| 103-345-8 |

I'd like to get a count of how many distinct sample id's are in each study group:

| Study | Count | 
 _______________
| 101   |   2   | 
| 102   |   1   | 
| 103   |   2   |

I can easily create separate queries to find the number in each group:

SELECT COUNT(DISTINCT ill.sample_id) as 101_count
FROM illumina_data ill
WHERE SUBSTRING(ill.sample_id, 1,3) = "101"

But I'd really like to figure out how to use a single query to find a count for 101, 102 and 103.

There are 1 answers

**Barbara Laird** · Accepted Answer · 2015-06-09T00:20:26+00:00

Barbara Laird On 09 June 2015 at 00:20 BEST ANSWER

SELECT SUBSTRING(sample, 1,3) , COUNT(DISTINCT sample)
FROM samples 
group by SUBSTRING(sample, 1,3);