Django. Database query: distinct for one field

1k views Asked by At

I have following fields and data in DB:

FirstName     LastName    Date

John          Davis       10-10-2011
Joe           Gray        20-09-2011
Ann           Davis       03-04-2010
Ann           Bovis       01-04-2010

How can I select from DB 3 items with latest date and unique last name with all other fields included in result. That is mean if there are more than one item with LastName Davis in result should be only one with latest date. Don't even know how to do it on pure SQL - and Django looks impossible at all.

EDIT1. SQL QUERY.

I wrote query on pure SQL:

SELECT TOP 3 *
FROM peopleTable as a1
WHERE  (select count(*)FROM peopleTable as a2 where a2.LastName=a1.LastName and a2.Date > a1.Date)<=0
ORDER by Date desc

I guess it would be better if I use this query in django just as pure SQL.

3

There are 3 answers

2
Jordan Reiter On BEST ANSWER

You can do this:

latest date and unique last name

or

all other fields

but you can't do both. If you want to get the most recent date for last name Davis, which other fields should be retrieved, firstname=John or firstname=Ann?

I am guessing that you want the result to look like this:

FirstName     LastName    Date

John          Davis       10-10-2011
Joe           Gray        20-09-2011
Ann           Bovis       01-04-2010

Where the Ann Davis row is gone because she isn't the most recent Davis in the table. Correct?

In which case your raw SQL looks something like:

SELECT FirstName, LastName, Date
FROM
Table T1
WHERE
NOT EXISTS (
SELECT * FROM Table T2 WHERE T2.LastName=T1.LastName AND T2.Date > T1.Date
)

Which you can write up in Django using the Item.objects.raw() option. You can also use extra(where=…)

3
Chris Pratt On

You can't really do it with a query; it's just too much logic for SQL. However, based on the fact that dictionary keys are unique, you can get what you want with a little ingenuity.

First, get the queryset ordered by date ascending. This may seem counter-intuitive because you want the latest date, but it'll make sense in the next steps:

qs = SomeModel.objects.order_by('date')

Next, we'll use this in a list comprehension to create (key, value) tuples (in Python 3+, you can actually do dictionary comprehensions, but since not many people I know are actually using Python 3+, I'm detailing it this way):

qs_list = [(item.last_name, item) for item in qs]

Finally, we convert this list of tuples into a dictionary:

items = dict(qs_list)

You'll now have a dictionary where each last_name key is unique. Since dictionary keys must be unique, the last value of each duplicated last name was the one that actually made it in. Since the list was ordered by date ascending, the last value was the "lastest".

You can convert this dictionary back into a straight list or just iterate over it as it is. The only catch is that you're no longer dealing with a QuerySet, so you can't do any further filters, etc. Just keep that in mind and do these steps last after you've fully built your query.

And, of course, you can do that all as a one liner if you're so inclined:

items = dict([(item.last_name, item) for item in SomeModel.objects.order_by('date')])
1
second On

see the docs on aggregation

guess you want something like

Item.objects.values("lastname").annotate(
    date=Max("date"), firstname=Max("firstname"
    ).order_by()

you have to aggregate over or group by all fields included in an aggregate query. in your case you want max(data) and group_by lastname. you have to choose some aggregation for firstname as well, say max

note that there seems to be a postgres bug on aggregating over charfields