TSQL Efficient Spatial Query in tiled scenarios

32 views Asked by At

I've got a large table of spatial data representing roads. The data contains a BeginNetworkId and an EndNetworkId which allows us to view the data over time. I.E. When the latest networkId was 2 for example. Additionally a RouteHierarchy column allowing me to filter out particular road types.

The way the data is maintained means that for a given spatial area, I may have some records with an Id < 10,000 and some with Id > 10,000,000.

I'm trying to write a query to provide the data for a tiled service, loading increasing detail as you zoom in - i.e. motorways at the zoom out view and minor roads at the most zoomed in level. Despite the filters I'm seeing the performance is very slow on the larger tiles.

SELECT 
    r.*
FROM [RoadLinks] AS r with(index([IX_RoadLinks_Geometry]))
WHERE r.[Geometry].STIntersects(geometry::STGeomFromText('POLYGON ((528601 164864, 528601 172032, 535769 172032, 535769 164864, 528601 164864))', 27700)) = 1
AND r.[BeginNetworkId] <= 2
AND r.[EndNetworkId] >= 2
AND r.[RouteHierarchy] IN (0,1,2,3)

We noticed that SQLServer doesn't seem to want to use it's spatial index unless we explicitly force it to. Hence the table hint.

I can see from the execution plan, the spatial index is taking 0.111s and returns about 20,000 records. Because the spatial index cannot include any other columns, its having to join the rows from the clustered index before evaluating the BeginNetworkId, EndNetworkId and RouteHierarchy predicates. As a result it's performing 20,000 Clustered Index Seeks which takes several orders of magnitude more time.

This doesn't seem to scale well when the area is large or the table is large. enter image description here

Note; I've tried updating my statistics and there is very little fragmentation in my index.

I've seen this article, but I can't believe that this is necessary in order to efficiently query the data in this way. Is there a better way to do this that does scale to larger bbox regions and larger tables?

1

There are 1 answers

0
Lawrence Phillips On

For anyone else stuck on the same issue; this is how I resolved it. Not sure if its the best way, but it worked for me.

To solve this I’m storing and indexing a field that is spatially common amongst the road links in a particular area. I’ve used the OS grid reference system for this, creating a table of every 10sqkm tile within the country and storing against each road link, which 10sqkm tile the link mostly resides within. This way we can query the grid reference table to find a small number of tiles and join to the road links table to obtain the records by grid reference and then subsequently filter them further.

This will always perform a nested loop using the grid reference column. However for the same area produces significantly less iterations due to selecting a 10sqkm tile rather than the row ids.

For each iteration of the loop, all of the road links for that tile is roughly continuous within the index so we see a much better and more scalable performance.

enter image description here In this example, the nested loop is only doing 100 iterations covering 100sqkm tiles.

DECLARE @g as Geometry
SET @g = geometry::STGeomFromText('POLYGON ((408601 104864, 408601 192032, 495769 192032, 495769 104864, 408601 104864))', 27700)
SELECT 
    r.RoadLinkId,
    r.GridReference,
    r.OSId,  
    r.RouteHierarchy,
    r.IsTrunkRoad,
    r.IsPrimaryRoute,
    r.Geometry,
    r.Name,
    r.LocalCode,
    r.NationalCode
FROM [dbo].[GridReferenceTiles] t
INNER LOOP JOIN [dbo].[RoadLinks] r WITH(NOLOCK)
    ON r.GridReference = t.GridReference
    AND r.[RouteHierarchy] IN (0,2,3,4)
    AND r.[EndNetworkId] >= 1
    AND r.[BeginNetworkId] <= 1
    AND r.[Geometry].STIntersects(@g) = 1
WHERE t.[Geometry].STIntersects(@g) = 1

In some scenarios SQL Server tries to scan all the road links and hash match them, which is actually slower than a nested loop, so I've ended up forcing an execution plan using a nested loop as the index is designed.

In this example I've used a covering non-clustered index with the grid reference as the first index column. It could also be done as a Clustered Index, but you would need to create a non-clustered index for the PK otherwise selection for a single record would be incredibly slow.

I'd be interested in feedback and thoughts on this approach.