Impala Query: Combine multiple COUNT DISTINCT WHERE clauses

1.6k views Asked by At

On impala, I'm trying to count the number of distinct sample id's that contain "101", "102" or "103".

Here is an example of my data:

| sample_id | 
 ___________
| 101-345-5 | 
| 101-345-6 | 
| 101-345-6 | 
| 102-345-5 | 
| 103-345-5 | 
| 103-345-8 | 
| 103-345-8 | 

I'd like to get a count of how many distinct sample id's are in each study group:

| Study | Count | 
 _______________
| 101   |   2   | 
| 102   |   1   | 
| 103   |   2   | 

I can easily create separate queries to find the number in each group:

SELECT COUNT(DISTINCT ill.sample_id) as 101_count
FROM illumina_data ill
WHERE SUBSTRING(ill.sample_id, 1,3) = "101" 

But I'd really like to figure out how to use a single query to find a count for 101, 102 and 103.

1

There are 1 answers

0
Barbara Laird On BEST ANSWER

A group by will do the trick http://sqlfiddle.com/#!9/1d75f/6

SELECT SUBSTRING(sample, 1,3) , COUNT(DISTINCT sample)
FROM samples 
group by SUBSTRING(sample, 1,3);