I have a table MyTable
with values that look like this:
| id | name | type | category |
---------------------------------------------------
| 1 | Rob | Red | Rock |
| 2 | Rob | Blue | Rap |
| 2 | Rob | Blue | Rock |
| 3 | Jane | Green | Country |
| 3 | Jane | Green | Rap |
| 4 | Meg | Yellow | Rock |
| 5 | Jane | Blue | Rap |
| 5 | Jane | Blue | Rock |
| 6 | Jane | Red | Country |
| 6 | Jane | Red | Rock |
| 7 | Rob | Red | Rap |
| 7 | Rob | Red | Country |
| 8 | Meg | Green | Country |
| 9 | Meg | Blue | Rap |
Now, my issue resides in the fact that (as the data is given to me), there are duplicate ids. Each id stands for a report, so id of 1 is for report 1. In this report, there is a name
, type
, and category
. The report can only have one type, but as many categories as it likes. Hence, the duplicate ids come from there being different categories, each constituting a new row. The end result i wish to achieve it to list the names in one row, along with all the types + count (where count is the count of the distinct types, as in one type per report) in the next row. It would look like such:
| Rob | Red(2), Blue(1) |
| Jane | Green(1), Blue(1), Red(1) |
| Meg | Yellow(1), Green(1), Blue(1)|
Now, I've developed a query that actually uses two subqueries and successfully achieves this result. It goes
select name as firstCol, group_concat(type_counts order by countName desc separator ', ') as secondCol from
(select name, concat(type,' (',count(name),')') as type_counts, count(name) as countName from
(select name, type from
some join stuff
where (date_reported between '2014-11-01' and '2014-11-31')
group by id order by type, name) a
group by name, type order by name, count(name) desc) a
group by name;
This query essentially groups by id first, to remove the duplicate ids and disregard the split due to differing categories. The query wrapping it then groups by name and traffic type, concatenating the traffic type and the count of names together as "type (count)". The third groups solely by name and group concats all of the types to have a column for the name, and then a second column with all of the type+counts listed and separated by commas. I was just wondering...is there a query that could make it faster, by not having to use so many subqueries and such? Thanks.
Use a subquery as an expression in the SELECT clause and another in the WHERE clause. For example:
References
Users with their own Silver Tag Badges - Stack Exchange Data Explorer
Relational Algebra and SQL (pdf)
A Gentle Introduction to SQL