I am modeling (in Postgres 9.6.1 / postGIS 2.3.1) a booking system for local services provided by suppliers:
create table supplier (
id serial primary key,
name text not null check (char_length(title) < 280),
type service_type,
duration interval,
...
geo_position geography(POINT,4326)
...
);
Each supplier keeps a calendar with time slots when he/she is available to be booked:
create table timeslot (
id serial primary key,
supplier_id integer not null references supplier(id),
slot tstzrange not null,
constraint supplier_overlapping_timeslot_not_allowed
exclude using gist (supplier_id with =, slot with &&)
);
For when a client wants to know which nearby suppliers are available to book at a certain time, I create a view and a function:
create view supplier_slots as
select
supplier.name, supplier.type, supplier.geo_position, supplier.duration, ...
timeslot.slot
from
supplier, timeslot
where
supplier.id = timeslot.supplier_id;
create function find_suppliers(wantedType service_type, near_latitude text, near_longitude text, at_time timestamptz)
returns setof supplier_slots as $$
declare
nearpoint geography;
begin
nearpoint := ST_GeographyFromText('SRID=4326;POINT(' || near_latitude || ' ' || near_longitude || ')');
return query
select * from supplier_slots
where type = wantedType
and tstzrange(at_time, at_time + duration) <@ slot
order by ST_Distance( nearpoint, geo_position )
limit 100;
end;
$$ language plpgsql;
All this works really well.
Now, for the suppliers that did NOT have a bookable time slot at the requested time, I would like to find their closest available timeslots, before and after the requested at_time
, also sorted by distance.
This has my mind spinning a little bit and I can't find any suitable operators to give me the nearest tsrange.
Any ideas on the smartest way to do this?
The solution depends on the exact definition of what you want.
Schema
I suggest these slightly adapted table definitions to make the task simpler, enforce integrity and improve performance:
Save two
timestamptz
columnsslot_a
andslot_z
instead of thetstzrange
columnslot
- and adapt constraints accordingly. This treats all ranges as default inclusive lower and exclusive upper bounds automatically now - which avoids corner case errors / headache.Collateral benefit: only 16 bytes for 2
timestamptz
instead of 25 bytes (32 with padding) for thetstzrange
.All queries you might have had on
slot
keep working withtstzrange(slot_a, slot_z)
as drop-in replacement.Add an index on
(supplier_id, slot_z)
for the query at hand.And a spatial index on
supplier.geo_position
(which you probably have already).Depending on data distribution in
type
, a couple of partial indexes for types common in queries might help performance:Query / Function
This query finds the X closest suppliers who offer the correct
service_type
(100 in the example), each with the one closest matching time slot (defined by the time distance to the start of the slot). I combined this with actually matching slots, which may or may not be what you need.I did not use your view
supplier_slots
and optimized for performance instead. The view may still be convenient. You might includetstzrange(slot_a, slot_z) AS slot
for backward compatibility.The basic query to find the 100 closest suppliers is a textbook "K Nearest Neighbour" problem. A GiST index works well for this. Related:
The additional task (find the temporally nearest slot) can be split in two tasks: to find the next higher and the next lower row. The core feature of the solution is to have two subqueries with
ORDER BY slot_z LIMIT 1
andORDER BY slot_z DESC LIMIT 1
, which result in two very fast index scans.I combined the first one with finding actual matches, which is a (smart, I think) optimization, but may distract from the actual solution.