I have this query (edited for simplicity):
select to_timestamp(s.sampletimestamp/1000)
from sample s
where s.sampletimestamp >= extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and
s.sampletimestamp < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000
order by s.sampletimestamp;
I notice that this executes much quicker by entering values for the time manually:
select to_timestamp(s.sampletimestamp/1000)
from sample s
where s.sampletimestamp >= 1376143200000 and
s.sampletimestamp < 1376229600000
order by s.sampletimestamp;
Where the time is an epoch time stamp in milliseconds. My guess is that the computer is evaluating the extract(EPOCH...)
part for every record, whereas it only really needs to do that once.
Is there someway to keep the more human-readable form of the first query, whilst keeping the query as efficient as the second?
I am new to PostgreSQL (and entirely self-taught), so I think the problem I'm mostly suffering from is not knowing a particular keyword I should be putting into google - which I've already used as well as the PostgreSQL documentation.
Thanks in advance :)
EDIT1: Thanks for the very detailed replies. I suspect I'm probably in a different time zone to most of the respondants - I'll provide experimental evidence for this tomorrow (it's very late here).
EDIT2: To summarise the answer below, casting with 'bigint' does the trick. Replace:
where s.sampletimestamp >= extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and
s.sampletimestamp < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000
With:
where s.sampletimestamp >= extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')::bigint*1000 and
s.sampletimestamp < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')::bigint*1000
What's happening here is that
extract
is implemented using thedate_part
function:date_part(text, timestamptz)
is defined asstable
notimmutable
:and I'm pretty sure that'll prevent Pg from pre-computing the value and inlining it into the call. I'd need to dig deeper to be sure.
I believe the reasoning is that
date_part
on atimestamptz
can be dependent on the value of theTimeZone
setting. This isn't true fordate_part('epoch', some_timestamptz)
but the query planner doesn't understand at planning time that you're using that.I'm still surprised that it doesn't get pre-computed, as the documentation states:
You can work around this apparent limitation by converting to a timestamp at UTC (or whatever the TZ the epoch of your times is in) first, with
AT TIME ZONE 'UTC'
. E.g.:This executes faster, though there's more time difference than I'd expect if it were just being calculated once:
It would be possible to remove this query optimizer limitation / add a feature to optimize this. The optimizer would need to recognize, probably at parse time, that
extract('epoch', ...)
is a special case and instead of invokingdate_part('epoch, ...)
invoke a specialtimestamptz_epoch(...)
function that was immutable.A bit of looking at
perf top
results shows that the timestamptz case has the following peaks:wheras with the use of
AT TIME ZONE
we get:and with the integer case:
So you can see that the
AT TIME ZONE
version avoids the repeatedtimestamptz_part
anddatebsearch
calls. The main difference between it and the integer case isfloat8lt
; it looks like we're doingdouble precision
comparisons instead of integer comparisons.Sure enough, a cast takes care of it:
I don't have the time to pursue the enhancement to the optimizer discussed above at present, but it's something you might want to consider raising on the mailing list.