Django - Time aggregates of DatetimeField across queryset

1.7k views Asked by At

(using django 1.11.2, python 2.7.10, mysql 5.7.18)

If we imagine a simple model:

class Event(models.Model):
    happened_datetime = DateTimeField()
    value = IntegerField()

What would be the most elegant (and quickest) way to run something similar to:

res = Event.objects.all().aggregate(
    Avg('happened_datetime')
)

But that would be able to extract the average time of day for all members of the queryset. Something like:

res = Event.objects.all().aggregate(
    AvgTimeOfDay('happened_datetime')
)

Would it be possible to do this on the db directly?, i.e., without running a long loop client-side for each queryset member?

EDIT:

There may be a solution, along those lines, using raw SQL:

select sec_to_time(avg(time_to_sec(extract(HOUR_SECOND from happened_datetime)))) from event_event;

Performance-wise, this runs in 0.015 second for ~23k rows on a laptop, not optimised, etc. Assuming that could yield accurate/correct results and since time is only a secondary factor, could I be using that?

2

There are 2 answers

0
Hal On BEST ANSWER

So after a little search and tries.. the below seems to work. Any comments on how to improve (or hinting as to why it is completely wrong), are welcome! :-)

res = Event.objects.raw('''
SELECT id, sec_to_time(avg(time_to_sec(extract(HOUR_SECOND from happened_datetime)))) AS average_time_of_day
FROM event_event
WHERE happened_datetime BETWEEN %s AND %s;''', [start_datetime, end_datetime])

print res[0].__dict__
# {'average_time_of_day': datetime.time(18, 48, 10, 247700), '_state': <django.db.models.base.ModelState object at 0x0445B370>, 'id': 9397L}

Now the ID returned is that of the last object falling in the datetime range for the WHERE clause. I believe Django just inserts that because of "InvalidQuery: Raw query must include the primary key".

Quick explanation of the SQL series of function calls:

  1. Extract HH:MM:SS from all datetime fields
  2. Convert the time values to seconds via time_to_sec.
  3. average all seconds values
  4. convert averaged seconds value back into time format (HH:MM:SS)

Don't know why Django insists on returning microseconds but that is not really relevant. (maybe the local ms at which the time object was instantiated?)

Performance note: this seems to be extremely fast but then again I haven't tested that bit. Any insight would be kindly appreciated :)

2
Risadinha On

Add another integer field to your model that contains only the hour of the day extracted from the happened_datetime.

When creating/updating a model instance you need to update this new field accordingly whenever the happened_datetime is set/updated. You can extract the hours of the day for example by reading datetime.datetime.hour. Or use strftime to create a value to your liking.

Aggregation should then work as proposed by yourself.

EDIT:

Django's ORM has Extract() as a function. Example from the docs adapted to your use case:

>>> # How many experiments completed in the same year in which they started?
>>> Event.objects.aggregate(
...    happenend_datetime__hour=Extract('happenend_datetime', 'hour'))

(Not tested!) https://docs.djangoproject.com/en/1.11/ref/models/database-functions/#extract